Related
How does Java handle integer underflows and overflows?
Leading on from that, how would you check/test that this is occurring?
If it overflows, it goes back to the minimum value and continues from there. If it underflows, it goes back to the maximum value and continues from there.
You can check that beforehand as follows:
public static boolean willAdditionOverflow(int left, int right) {
if (right < 0 && right != Integer.MIN_VALUE) {
return willSubtractionOverflow(left, -right);
} else {
return (~(left ^ right) & (left ^ (left + right))) < 0;
}
}
public static boolean willSubtractionOverflow(int left, int right) {
if (right < 0) {
return willAdditionOverflow(left, -right);
} else {
return ((left ^ right) & (left ^ (left - right))) < 0;
}
}
(you can substitute int by long to perform the same checks for long)
If you think that this may occur more than often, then consider using a datatype or object which can store larger values, e.g. long or maybe java.math.BigInteger. The last one doesn't overflow, practically, the available JVM memory is the limit.
If you happen to be on Java8 already, then you can make use of the new Math#addExact() and Math#subtractExact() methods which will throw an ArithmeticException on overflow.
public static boolean willAdditionOverflow(int left, int right) {
try {
Math.addExact(left, right);
return false;
} catch (ArithmeticException e) {
return true;
}
}
public static boolean willSubtractionOverflow(int left, int right) {
try {
Math.subtractExact(left, right);
return false;
} catch (ArithmeticException e) {
return true;
}
}
The source code can be found here and here respectively.
Of course, you could also just use them right away instead of hiding them in a boolean utility method.
Well, as far as primitive integer types go, Java doesnt handle Over/Underflow at all (for float and double the behaviour is different, it will flush to +/- infinity just as IEEE-754 mandates).
When adding two int's, you will get no indication when an overflow occurs. A simple method to check for overflow is to use the next bigger type to actually perform the operation and check if the result is still in range for the source type:
public int addWithOverflowCheck(int a, int b) {
// the cast of a is required, to make the + work with long precision,
// if we just added (a + b) the addition would use int precision and
// the result would be cast to long afterwards!
long result = ((long) a) + b;
if (result > Integer.MAX_VALUE) {
throw new RuntimeException("Overflow occured");
} else if (result < Integer.MIN_VALUE) {
throw new RuntimeException("Underflow occured");
}
// at this point we can safely cast back to int, we checked before
// that the value will be withing int's limits
return (int) result;
}
What you would do in place of the throw clauses, depends on your applications requirements (throw, flush to min/max or just log whatever). If you want to detect overflow on long operations, you're out of luck with primitives, use BigInteger instead.
Edit (2014-05-21): Since this question seems to be referred to quite frequently and I had to solve the same problem myself, its quite easy to evaluate the overflow condition by the same method a CPU would calculate its V flag.
Its basically a boolean expression that involves the sign of both operands as well as the result:
/**
* Add two int's with overflow detection (r = s + d)
*/
public static int add(final int s, final int d) throws ArithmeticException {
int r = s + d;
if (((s & d & ~r) | (~s & ~d & r)) < 0)
throw new ArithmeticException("int overflow add(" + s + ", " + d + ")");
return r;
}
In java its simpler to apply the expression (in the if) to the entire 32 bits, and check the result using < 0 (this will effectively test the sign bit). The principle works exactly the same for all integer primitive types, changing all declarations in above method to long makes it work for long.
For smaller types, due to the implicit conversion to int (see the JLS for bitwise operations for details), instead of checking < 0, the check needs to mask the sign bit explicitly (0x8000 for short operands, 0x80 for byte operands, adjust casts and parameter declaration appropiately):
/**
* Subtract two short's with overflow detection (r = d - s)
*/
public static short sub(final short d, final short s) throws ArithmeticException {
int r = d - s;
if ((((~s & d & ~r) | (s & ~d & r)) & 0x8000) != 0)
throw new ArithmeticException("short overflow sub(" + s + ", " + d + ")");
return (short) r;
}
(Note that above example uses the expression need for subtract overflow detection)
So how/why do these boolean expressions work? First, some logical thinking reveals that an overflow can only occur if the signs of both arguments are the same. Because, if one argument is negative and one positive, the result (of add) must be closer to zero, or in the extreme case one argument is zero, the same as the other argument. Since the arguments by themselves can't create an overflow condition, their sum can't create an overflow either.
So what happens if both arguments have the same sign? Lets take a look at the case both are positive: adding two arguments that create a sum larger than the types MAX_VALUE, will always yield a negative value, so an overflow occurs if arg1 + arg2 > MAX_VALUE. Now the maximum value that could result would be MAX_VALUE + MAX_VALUE (the extreme case both arguments are MAX_VALUE). For a byte (example) that would mean 127 + 127 = 254. Looking at the bit representations of all values that can result from adding two positive values, one finds that those that overflow (128 to 254) all have bit 7 set, while all that do not overflow (0 to 127) have bit 7 (topmost, sign) cleared. Thats exactly what the first (right) part of the expression checks:
if (((s & d & ~r) | (~s & ~d & r)) < 0)
(~s & ~d & r) becomes true, only if, both operands (s, d) are positive and the result (r) is negative (the expression works on all 32 bits, but the only bit we're interested in is the topmost (sign) bit, which is checked against by the < 0).
Now if both arguments are negative, their sum can never be closer to zero than any of the arguments, the sum must be closer to minus infinity. The most extreme value we can produce is MIN_VALUE + MIN_VALUE, which (again for byte example) shows that for any in range value (-1 to -128) the sign bit is set, while any possible overflowing value (-129 to -256) has the sign bit cleared. So the sign of the result again reveals the overflow condition. Thats what the left half (s & d & ~r) checks for the case where both arguments (s, d) are negative and a result that is positive. The logic is largely equivalent to the positive case; all bit patterns that can result from adding two negative values will have the sign bit cleared if and only if an underflow occured.
By default, Java's int and long math silently wrap around on overflow and underflow. (Integer operations on other integer types are performed by first promoting the operands to int or long, per JLS 4.2.2.)
As of Java 8, java.lang.Math provides addExact, subtractExact, multiplyExact, incrementExact, decrementExact and negateExact static methods for both int and long arguments that perform the named operation, throwing ArithmeticException on overflow. (There's no divideExact method -- you'll have to check the one special case (MIN_VALUE / -1) yourself.)
As of Java 8, java.lang.Math also provides toIntExact to cast a long to an int, throwing ArithmeticException if the long's value does not fit in an int. This can be useful for e.g. computing the sum of ints using unchecked long math, then using toIntExact to cast to int at the end (but be careful not to let your sum overflow).
If you're still using an older version of Java, Google Guava provides IntMath and LongMath static methods for checked addition, subtraction, multiplication and exponentiation (throwing on overflow). These classes also provide methods to compute factorials and binomial coefficients that return MAX_VALUE on overflow (which is less convenient to check). Guava's primitive utility classes, SignedBytes, UnsignedBytes, Shorts and Ints, provide checkedCast methods for narrowing larger types (throwing IllegalArgumentException on under/overflow, not ArithmeticException), as well as saturatingCast methods that return MIN_VALUE or MAX_VALUE on overflow.
Java doesn't do anything with integer overflow for either int or long primitive types and ignores overflow with positive and negative integers.
This answer first describes the of integer overflow, gives an example of how it can happen, even with intermediate values in expression evaluation, and then gives links to resources that give detailed techniques for preventing and detecting integer overflow.
Integer arithmetic and expressions reslulting in unexpected or undetected overflow are a common programming error. Unexpected or undetected integer overflow is also a well-known exploitable security issue, especially as it affects array, stack and list objects.
Overflow can occur in either a positive or negative direction where the positive or negative value would be beyond the maximum or minimum values for the primitive type in question. Overflow can occur in an intermediate value during expression or operation evaluation and affect the outcome of an expression or operation where the final value would be expected to be within range.
Sometimes negative overflow is mistakenly called underflow. Underflow is what happens when a value would be closer to zero than the representation allows. Underflow occurs in integer arithmetic and is expected. Integer underflow happens when an integer evaluation would be between -1 and 0 or 0 and 1. What would be a fractional result truncates to 0. This is normal and expected with integer arithmetic and not considered an error. However, it can lead to code throwing an exception. One example is an "ArithmeticException: / by zero" exception if the result of integer underflow is used as a divisor in an expression.
Consider the following code:
int bigValue = Integer.MAX_VALUE;
int x = bigValue * 2 / 5;
int y = bigValue / x;
which results in x being assigned 0 and the subsequent evaluation of bigValue / x throws an exception, "ArithmeticException: / by zero" (i.e. divide by zero), instead of y being assigned the value 2.
The expected result for x would be 858,993,458 which is less than the maximum int value of 2,147,483,647. However, the intermediate result from evaluating Integer.MAX_Value * 2, would be 4,294,967,294, which exceeds the maximum int value and is -2 in accordance with 2s complement integer representations. The subsequent evaluation of -2 / 5 evaluates to 0 which gets assigned to x.
Rearranging the expression for computing x to an expression that, when evaluated, divides before multiplying, the following code:
int bigValue = Integer.MAX_VALUE;
int x = bigValue / 5 * 2;
int y = bigValue / x;
results in x being assigned 858,993,458 and y being assigned 2, which is expected.
The intermediate result from bigValue / 5 is 429,496,729 which does not exceed the maximum value for an int. Subsequent evaluation of 429,496,729 * 2 doesn't exceed the maximum value for an int and the expected result gets assigned to x. The evaluation for y then does not divide by zero. The evaluations for x and y work as expected.
Java integer values are stored as and behave in accordance with 2s complement signed integer representations. When a resulting value would be larger or smaller than the maximum or minimum integer values, a 2's complement integer value results instead. In situations not expressly designed to use 2s complement behavior, which is most ordinary integer arithmetic situations, the resulting 2s complement value will cause a programming logic or computation error as was shown in the example above. An excellent Wikipedia article describes 2s compliment binary integers here: Two's complement - Wikipedia
There are techniques for avoiding unintentional integer overflow. Techinques may be categorized as using pre-condition testing, upcasting and BigInteger.
Pre-condition testing comprises examining the values going into an arithmetic operation or expression to ensure that an overflow won't occur with those values. Programming and design will need to create testing that ensures input values won't cause overflow and then determine what to do if input values occur that will cause overflow.
Upcasting comprises using a larger primitive type to perform the arithmetic operation or expression and then determining if the resulting value is beyond the maximum or minimum values for an integer. Even with upcasting, it is still possible that the value or some intermediate value in an operation or expression will be beyond the maximum or minimum values for the upcast type and cause overflow, which will also not be detected and will cause unexpected and undesired results. Through analysis or pre-conditions, it may be possible to prevent overflow with upcasting when prevention without upcasting is not possible or practical. If the integers in question are already long primitive types, then upcasting is not possible with primitive types in Java.
The BigInteger technique comprises using BigInteger for the arithmetic operation or expression using library methods that use BigInteger. BigInteger does not overflow. It will use all available memory, if necessary. Its arithmetic methods are normally only slightly less efficient than integer operations. It is still possible that a result using BigInteger may be beyond the maximum or minimum values for an integer, however, overflow will not occur in the arithmetic leading to the result. Programming and design will still need to determine what to do if a BigInteger result is beyond the maximum or minimum values for the desired primitive result type, e.g., int or long.
The Carnegie Mellon Software Engineering Institute's CERT program and Oracle have created a set of standards for secure Java programming. Included in the standards are techniques for preventing and detecting integer overflow. The standard is published as a freely accessible online resource here: The CERT Oracle Secure Coding Standard for Java
The standard's section that describes and contains practical examples of coding techniques for preventing or detecting integer overflow is here: NUM00-J. Detect or prevent integer overflow
Book form and PDF form of The CERT Oracle Secure Coding Standard for Java are also available.
Having just kinda run into this problem myself, here's my solution (for both multiplication and addition):
static boolean wouldOverflowOccurwhenMultiplying(int a, int b) {
// If either a or b are Integer.MIN_VALUE, then multiplying by anything other than 0 or 1 will result in overflow
if (a == 0 || b == 0) {
return false;
} else if (a > 0 && b > 0) { // both positive, non zero
return a > Integer.MAX_VALUE / b;
} else if (b < 0 && a < 0) { // both negative, non zero
return a < Integer.MAX_VALUE / b;
} else { // exactly one of a,b is negative and one is positive, neither are zero
if (b > 0) { // this last if statements protects against Integer.MIN_VALUE / -1, which in itself causes overflow.
return a < Integer.MIN_VALUE / b;
} else { // a > 0
return b < Integer.MIN_VALUE / a;
}
}
}
boolean wouldOverflowOccurWhenAdding(int a, int b) {
if (a > 0 && b > 0) {
return a > Integer.MAX_VALUE - b;
} else if (a < 0 && b < 0) {
return a < Integer.MIN_VALUE - b;
}
return false;
}
feel free to correct if wrong or if can be simplified. I've done some testing with the multiplication method, mostly edge cases, but it could still be wrong.
There are libraries that provide safe arithmetic operations, which check integer overflow/underflow . For example, Guava's IntMath.checkedAdd(int a, int b) returns the sum of a and b, provided it does not overflow, and throws ArithmeticException if a + b overflows in signed int arithmetic.
It wraps around.
e.g:
public class Test {
public static void main(String[] args) {
int i = Integer.MAX_VALUE;
int j = Integer.MIN_VALUE;
System.out.println(i+1);
System.out.println(j-1);
}
}
prints
-2147483648
2147483647
Since java8 the java.lang.Math package has methods like addExact() and multiplyExact() which will throw an ArithmeticException when an overflow occurs.
I think you should use something like this and it is called Upcasting:
public int multiplyBy2(int x) throws ArithmeticException {
long result = 2 * (long) x;
if (result > Integer.MAX_VALUE || result < Integer.MIN_VALUE){
throw new ArithmeticException("Integer overflow");
}
return (int) result;
}
You can read further here:
Detect or prevent integer overflow
It is quite reliable source.
It doesn't do anything -- the under/overflow just happens.
A "-1" that is the result of a computation that overflowed is no different from the "-1" that resulted from any other information. So you can't tell via some status or by inspecting just a value whether it's overflowed.
But you can be smart about your computations in order to avoid overflow, if it matters, or at least know when it will happen. What's your situation?
static final int safeAdd(int left, int right)
throws ArithmeticException {
if (right > 0 ? left > Integer.MAX_VALUE - right
: left < Integer.MIN_VALUE - right) {
throw new ArithmeticException("Integer overflow");
}
return left + right;
}
static final int safeSubtract(int left, int right)
throws ArithmeticException {
if (right > 0 ? left < Integer.MIN_VALUE + right
: left > Integer.MAX_VALUE + right) {
throw new ArithmeticException("Integer overflow");
}
return left - right;
}
static final int safeMultiply(int left, int right)
throws ArithmeticException {
if (right > 0 ? left > Integer.MAX_VALUE/right
|| left < Integer.MIN_VALUE/right
: (right < -1 ? left > Integer.MIN_VALUE/right
|| left < Integer.MAX_VALUE/right
: right == -1
&& left == Integer.MIN_VALUE) ) {
throw new ArithmeticException("Integer overflow");
}
return left * right;
}
static final int safeDivide(int left, int right)
throws ArithmeticException {
if ((left == Integer.MIN_VALUE) && (right == -1)) {
throw new ArithmeticException("Integer overflow");
}
return left / right;
}
static final int safeNegate(int a) throws ArithmeticException {
if (a == Integer.MIN_VALUE) {
throw new ArithmeticException("Integer overflow");
}
return -a;
}
static final int safeAbs(int a) throws ArithmeticException {
if (a == Integer.MIN_VALUE) {
throw new ArithmeticException("Integer overflow");
}
return Math.abs(a);
}
There is one case, that is not mentioned above:
int res = 1;
while (res != 0) {
res *= 2;
}
System.out.println(res);
will produce:
0
This case was discussed here:
Integer overflow produces Zero.
I think this should be fine.
static boolean addWillOverFlow(int a, int b) {
return (Integer.signum(a) == Integer.signum(b)) &&
(Integer.signum(a) != Integer.signum(a+b));
}
The 1st for loop in the below code does not find the maximum correctly due to an overflow. However, the 2nd for loop does. I used godbolt.com to look at the byte codes for this program which showed that to determine which number is greater the 1st for loop uses an isub and the 2nd for loop uses an if_icmple. Makes sense. However, why is the if_icmple able to successfully do this comparison since it too at some point must do a subtraction (which I would expect to produce an overflow)?
public class Overflow {
public static void main(String[] args) {
int[] nums = {3,-2147483648, 5, 7, 27, 9};
int curMax = nums[0];
for (int num : nums) {
int diff = curMax - num;
if (diff < 0) {
curMax = num;
}
}
System.out.println("1) max is " + curMax);
curMax = nums[0];
for (int num : nums) {
if (num > curMax) {
curMax = num;
}
}
System.out.println("2) max is " + curMax);
}
}
The output is
max is -2147483648
max is 27
Let's say that comparison is implemented using subtraction. Contrary to various other opinions here, I'd say that is highly likely. Eg cmp on x86 is just a subtraction that does not update its destination register, only the flags. Various other (but maybe not all) processors that have a flags register also work that way. In the rest of this answer I'll use x86 as a representative processor for examples.
However, there is an incorrect assumption made implicitly by your code: a comparison is not equivalent to a subtraction followed by checking the sign, it is equivalent to a subtraction followed by checking some combination of the Zero, Sign, and Overflow flags. For example, if you implement if (num > curMax) using some cmp followed by jle (to skip the body of the if when the condition is false), then jle does this:
Jump short if less or equal (ZF=1 or SF≠ OF).
Expressing the condition SF≠ OF directly in Java is not so easy. But the JVM itself has no such problem, it can use a comparison (or, equivalently, subtraction) followed by exactly the right kind of conditional jump.
There are some less-fortunate processors that do not have such as full set of conditional jumps as x86 has, but even in that case, the JVM has a lot more options than you do.
The option that the JVM does not have though, is implementing comparison incorrectly.
However, why is the if_icmple able to successfully do this comparison since it too at some point must do a subtraction (which I would expect to produce an overflow)?
It doesn't need to do a subtraction. It just needs to do a comparison. Comparisons don't have to involve subtraction, certainly not in the CPU.
Of course, we can also take alternate steps to deal with overflow. Here is one simple approach:
int cmp(int a, int b) {
boolean aNeg = (a >>> 31) != 0;
boolean bNeg = (b >>> 31) != 0;
if (aNeg != bNeg) {
return aNeg ? -1 : 1;
}
int diff = a - b; // subtracting two numbers with the same sign can't overflow
return (diff == 0) ? 0 : (diff < 0) ? -1 : 1;
}
All Java has to do is compile if_icmple to something like this, or whatever other instruction is appropriate on the target CPU. Using bytecode means Java can leave that up to the runtime to get right for the target CPU, whatever's fastest in such an environment -- using the overflow bit, doing something like this, whatever.
How does Java handle integer underflows and overflows?
Leading on from that, how would you check/test that this is occurring?
If it overflows, it goes back to the minimum value and continues from there. If it underflows, it goes back to the maximum value and continues from there.
You can check that beforehand as follows:
public static boolean willAdditionOverflow(int left, int right) {
if (right < 0 && right != Integer.MIN_VALUE) {
return willSubtractionOverflow(left, -right);
} else {
return (~(left ^ right) & (left ^ (left + right))) < 0;
}
}
public static boolean willSubtractionOverflow(int left, int right) {
if (right < 0) {
return willAdditionOverflow(left, -right);
} else {
return ((left ^ right) & (left ^ (left - right))) < 0;
}
}
(you can substitute int by long to perform the same checks for long)
If you think that this may occur more than often, then consider using a datatype or object which can store larger values, e.g. long or maybe java.math.BigInteger. The last one doesn't overflow, practically, the available JVM memory is the limit.
If you happen to be on Java8 already, then you can make use of the new Math#addExact() and Math#subtractExact() methods which will throw an ArithmeticException on overflow.
public static boolean willAdditionOverflow(int left, int right) {
try {
Math.addExact(left, right);
return false;
} catch (ArithmeticException e) {
return true;
}
}
public static boolean willSubtractionOverflow(int left, int right) {
try {
Math.subtractExact(left, right);
return false;
} catch (ArithmeticException e) {
return true;
}
}
The source code can be found here and here respectively.
Of course, you could also just use them right away instead of hiding them in a boolean utility method.
Well, as far as primitive integer types go, Java doesnt handle Over/Underflow at all (for float and double the behaviour is different, it will flush to +/- infinity just as IEEE-754 mandates).
When adding two int's, you will get no indication when an overflow occurs. A simple method to check for overflow is to use the next bigger type to actually perform the operation and check if the result is still in range for the source type:
public int addWithOverflowCheck(int a, int b) {
// the cast of a is required, to make the + work with long precision,
// if we just added (a + b) the addition would use int precision and
// the result would be cast to long afterwards!
long result = ((long) a) + b;
if (result > Integer.MAX_VALUE) {
throw new RuntimeException("Overflow occured");
} else if (result < Integer.MIN_VALUE) {
throw new RuntimeException("Underflow occured");
}
// at this point we can safely cast back to int, we checked before
// that the value will be withing int's limits
return (int) result;
}
What you would do in place of the throw clauses, depends on your applications requirements (throw, flush to min/max or just log whatever). If you want to detect overflow on long operations, you're out of luck with primitives, use BigInteger instead.
Edit (2014-05-21): Since this question seems to be referred to quite frequently and I had to solve the same problem myself, its quite easy to evaluate the overflow condition by the same method a CPU would calculate its V flag.
Its basically a boolean expression that involves the sign of both operands as well as the result:
/**
* Add two int's with overflow detection (r = s + d)
*/
public static int add(final int s, final int d) throws ArithmeticException {
int r = s + d;
if (((s & d & ~r) | (~s & ~d & r)) < 0)
throw new ArithmeticException("int overflow add(" + s + ", " + d + ")");
return r;
}
In java its simpler to apply the expression (in the if) to the entire 32 bits, and check the result using < 0 (this will effectively test the sign bit). The principle works exactly the same for all integer primitive types, changing all declarations in above method to long makes it work for long.
For smaller types, due to the implicit conversion to int (see the JLS for bitwise operations for details), instead of checking < 0, the check needs to mask the sign bit explicitly (0x8000 for short operands, 0x80 for byte operands, adjust casts and parameter declaration appropiately):
/**
* Subtract two short's with overflow detection (r = d - s)
*/
public static short sub(final short d, final short s) throws ArithmeticException {
int r = d - s;
if ((((~s & d & ~r) | (s & ~d & r)) & 0x8000) != 0)
throw new ArithmeticException("short overflow sub(" + s + ", " + d + ")");
return (short) r;
}
(Note that above example uses the expression need for subtract overflow detection)
So how/why do these boolean expressions work? First, some logical thinking reveals that an overflow can only occur if the signs of both arguments are the same. Because, if one argument is negative and one positive, the result (of add) must be closer to zero, or in the extreme case one argument is zero, the same as the other argument. Since the arguments by themselves can't create an overflow condition, their sum can't create an overflow either.
So what happens if both arguments have the same sign? Lets take a look at the case both are positive: adding two arguments that create a sum larger than the types MAX_VALUE, will always yield a negative value, so an overflow occurs if arg1 + arg2 > MAX_VALUE. Now the maximum value that could result would be MAX_VALUE + MAX_VALUE (the extreme case both arguments are MAX_VALUE). For a byte (example) that would mean 127 + 127 = 254. Looking at the bit representations of all values that can result from adding two positive values, one finds that those that overflow (128 to 254) all have bit 7 set, while all that do not overflow (0 to 127) have bit 7 (topmost, sign) cleared. Thats exactly what the first (right) part of the expression checks:
if (((s & d & ~r) | (~s & ~d & r)) < 0)
(~s & ~d & r) becomes true, only if, both operands (s, d) are positive and the result (r) is negative (the expression works on all 32 bits, but the only bit we're interested in is the topmost (sign) bit, which is checked against by the < 0).
Now if both arguments are negative, their sum can never be closer to zero than any of the arguments, the sum must be closer to minus infinity. The most extreme value we can produce is MIN_VALUE + MIN_VALUE, which (again for byte example) shows that for any in range value (-1 to -128) the sign bit is set, while any possible overflowing value (-129 to -256) has the sign bit cleared. So the sign of the result again reveals the overflow condition. Thats what the left half (s & d & ~r) checks for the case where both arguments (s, d) are negative and a result that is positive. The logic is largely equivalent to the positive case; all bit patterns that can result from adding two negative values will have the sign bit cleared if and only if an underflow occured.
By default, Java's int and long math silently wrap around on overflow and underflow. (Integer operations on other integer types are performed by first promoting the operands to int or long, per JLS 4.2.2.)
As of Java 8, java.lang.Math provides addExact, subtractExact, multiplyExact, incrementExact, decrementExact and negateExact static methods for both int and long arguments that perform the named operation, throwing ArithmeticException on overflow. (There's no divideExact method -- you'll have to check the one special case (MIN_VALUE / -1) yourself.)
As of Java 8, java.lang.Math also provides toIntExact to cast a long to an int, throwing ArithmeticException if the long's value does not fit in an int. This can be useful for e.g. computing the sum of ints using unchecked long math, then using toIntExact to cast to int at the end (but be careful not to let your sum overflow).
If you're still using an older version of Java, Google Guava provides IntMath and LongMath static methods for checked addition, subtraction, multiplication and exponentiation (throwing on overflow). These classes also provide methods to compute factorials and binomial coefficients that return MAX_VALUE on overflow (which is less convenient to check). Guava's primitive utility classes, SignedBytes, UnsignedBytes, Shorts and Ints, provide checkedCast methods for narrowing larger types (throwing IllegalArgumentException on under/overflow, not ArithmeticException), as well as saturatingCast methods that return MIN_VALUE or MAX_VALUE on overflow.
Java doesn't do anything with integer overflow for either int or long primitive types and ignores overflow with positive and negative integers.
This answer first describes the of integer overflow, gives an example of how it can happen, even with intermediate values in expression evaluation, and then gives links to resources that give detailed techniques for preventing and detecting integer overflow.
Integer arithmetic and expressions reslulting in unexpected or undetected overflow are a common programming error. Unexpected or undetected integer overflow is also a well-known exploitable security issue, especially as it affects array, stack and list objects.
Overflow can occur in either a positive or negative direction where the positive or negative value would be beyond the maximum or minimum values for the primitive type in question. Overflow can occur in an intermediate value during expression or operation evaluation and affect the outcome of an expression or operation where the final value would be expected to be within range.
Sometimes negative overflow is mistakenly called underflow. Underflow is what happens when a value would be closer to zero than the representation allows. Underflow occurs in integer arithmetic and is expected. Integer underflow happens when an integer evaluation would be between -1 and 0 or 0 and 1. What would be a fractional result truncates to 0. This is normal and expected with integer arithmetic and not considered an error. However, it can lead to code throwing an exception. One example is an "ArithmeticException: / by zero" exception if the result of integer underflow is used as a divisor in an expression.
Consider the following code:
int bigValue = Integer.MAX_VALUE;
int x = bigValue * 2 / 5;
int y = bigValue / x;
which results in x being assigned 0 and the subsequent evaluation of bigValue / x throws an exception, "ArithmeticException: / by zero" (i.e. divide by zero), instead of y being assigned the value 2.
The expected result for x would be 858,993,458 which is less than the maximum int value of 2,147,483,647. However, the intermediate result from evaluating Integer.MAX_Value * 2, would be 4,294,967,294, which exceeds the maximum int value and is -2 in accordance with 2s complement integer representations. The subsequent evaluation of -2 / 5 evaluates to 0 which gets assigned to x.
Rearranging the expression for computing x to an expression that, when evaluated, divides before multiplying, the following code:
int bigValue = Integer.MAX_VALUE;
int x = bigValue / 5 * 2;
int y = bigValue / x;
results in x being assigned 858,993,458 and y being assigned 2, which is expected.
The intermediate result from bigValue / 5 is 429,496,729 which does not exceed the maximum value for an int. Subsequent evaluation of 429,496,729 * 2 doesn't exceed the maximum value for an int and the expected result gets assigned to x. The evaluation for y then does not divide by zero. The evaluations for x and y work as expected.
Java integer values are stored as and behave in accordance with 2s complement signed integer representations. When a resulting value would be larger or smaller than the maximum or minimum integer values, a 2's complement integer value results instead. In situations not expressly designed to use 2s complement behavior, which is most ordinary integer arithmetic situations, the resulting 2s complement value will cause a programming logic or computation error as was shown in the example above. An excellent Wikipedia article describes 2s compliment binary integers here: Two's complement - Wikipedia
There are techniques for avoiding unintentional integer overflow. Techinques may be categorized as using pre-condition testing, upcasting and BigInteger.
Pre-condition testing comprises examining the values going into an arithmetic operation or expression to ensure that an overflow won't occur with those values. Programming and design will need to create testing that ensures input values won't cause overflow and then determine what to do if input values occur that will cause overflow.
Upcasting comprises using a larger primitive type to perform the arithmetic operation or expression and then determining if the resulting value is beyond the maximum or minimum values for an integer. Even with upcasting, it is still possible that the value or some intermediate value in an operation or expression will be beyond the maximum or minimum values for the upcast type and cause overflow, which will also not be detected and will cause unexpected and undesired results. Through analysis or pre-conditions, it may be possible to prevent overflow with upcasting when prevention without upcasting is not possible or practical. If the integers in question are already long primitive types, then upcasting is not possible with primitive types in Java.
The BigInteger technique comprises using BigInteger for the arithmetic operation or expression using library methods that use BigInteger. BigInteger does not overflow. It will use all available memory, if necessary. Its arithmetic methods are normally only slightly less efficient than integer operations. It is still possible that a result using BigInteger may be beyond the maximum or minimum values for an integer, however, overflow will not occur in the arithmetic leading to the result. Programming and design will still need to determine what to do if a BigInteger result is beyond the maximum or minimum values for the desired primitive result type, e.g., int or long.
The Carnegie Mellon Software Engineering Institute's CERT program and Oracle have created a set of standards for secure Java programming. Included in the standards are techniques for preventing and detecting integer overflow. The standard is published as a freely accessible online resource here: The CERT Oracle Secure Coding Standard for Java
The standard's section that describes and contains practical examples of coding techniques for preventing or detecting integer overflow is here: NUM00-J. Detect or prevent integer overflow
Book form and PDF form of The CERT Oracle Secure Coding Standard for Java are also available.
Having just kinda run into this problem myself, here's my solution (for both multiplication and addition):
static boolean wouldOverflowOccurwhenMultiplying(int a, int b) {
// If either a or b are Integer.MIN_VALUE, then multiplying by anything other than 0 or 1 will result in overflow
if (a == 0 || b == 0) {
return false;
} else if (a > 0 && b > 0) { // both positive, non zero
return a > Integer.MAX_VALUE / b;
} else if (b < 0 && a < 0) { // both negative, non zero
return a < Integer.MAX_VALUE / b;
} else { // exactly one of a,b is negative and one is positive, neither are zero
if (b > 0) { // this last if statements protects against Integer.MIN_VALUE / -1, which in itself causes overflow.
return a < Integer.MIN_VALUE / b;
} else { // a > 0
return b < Integer.MIN_VALUE / a;
}
}
}
boolean wouldOverflowOccurWhenAdding(int a, int b) {
if (a > 0 && b > 0) {
return a > Integer.MAX_VALUE - b;
} else if (a < 0 && b < 0) {
return a < Integer.MIN_VALUE - b;
}
return false;
}
feel free to correct if wrong or if can be simplified. I've done some testing with the multiplication method, mostly edge cases, but it could still be wrong.
There are libraries that provide safe arithmetic operations, which check integer overflow/underflow . For example, Guava's IntMath.checkedAdd(int a, int b) returns the sum of a and b, provided it does not overflow, and throws ArithmeticException if a + b overflows in signed int arithmetic.
It wraps around.
e.g:
public class Test {
public static void main(String[] args) {
int i = Integer.MAX_VALUE;
int j = Integer.MIN_VALUE;
System.out.println(i+1);
System.out.println(j-1);
}
}
prints
-2147483648
2147483647
Since java8 the java.lang.Math package has methods like addExact() and multiplyExact() which will throw an ArithmeticException when an overflow occurs.
I think you should use something like this and it is called Upcasting:
public int multiplyBy2(int x) throws ArithmeticException {
long result = 2 * (long) x;
if (result > Integer.MAX_VALUE || result < Integer.MIN_VALUE){
throw new ArithmeticException("Integer overflow");
}
return (int) result;
}
You can read further here:
Detect or prevent integer overflow
It is quite reliable source.
It doesn't do anything -- the under/overflow just happens.
A "-1" that is the result of a computation that overflowed is no different from the "-1" that resulted from any other information. So you can't tell via some status or by inspecting just a value whether it's overflowed.
But you can be smart about your computations in order to avoid overflow, if it matters, or at least know when it will happen. What's your situation?
static final int safeAdd(int left, int right)
throws ArithmeticException {
if (right > 0 ? left > Integer.MAX_VALUE - right
: left < Integer.MIN_VALUE - right) {
throw new ArithmeticException("Integer overflow");
}
return left + right;
}
static final int safeSubtract(int left, int right)
throws ArithmeticException {
if (right > 0 ? left < Integer.MIN_VALUE + right
: left > Integer.MAX_VALUE + right) {
throw new ArithmeticException("Integer overflow");
}
return left - right;
}
static final int safeMultiply(int left, int right)
throws ArithmeticException {
if (right > 0 ? left > Integer.MAX_VALUE/right
|| left < Integer.MIN_VALUE/right
: (right < -1 ? left > Integer.MIN_VALUE/right
|| left < Integer.MAX_VALUE/right
: right == -1
&& left == Integer.MIN_VALUE) ) {
throw new ArithmeticException("Integer overflow");
}
return left * right;
}
static final int safeDivide(int left, int right)
throws ArithmeticException {
if ((left == Integer.MIN_VALUE) && (right == -1)) {
throw new ArithmeticException("Integer overflow");
}
return left / right;
}
static final int safeNegate(int a) throws ArithmeticException {
if (a == Integer.MIN_VALUE) {
throw new ArithmeticException("Integer overflow");
}
return -a;
}
static final int safeAbs(int a) throws ArithmeticException {
if (a == Integer.MIN_VALUE) {
throw new ArithmeticException("Integer overflow");
}
return Math.abs(a);
}
There is one case, that is not mentioned above:
int res = 1;
while (res != 0) {
res *= 2;
}
System.out.println(res);
will produce:
0
This case was discussed here:
Integer overflow produces Zero.
I think this should be fine.
static boolean addWillOverFlow(int a, int b) {
return (Integer.signum(a) == Integer.signum(b)) &&
(Integer.signum(a) != Integer.signum(a+b));
}
Today I've encountered the following problem, to which I can't seem to find a solution:
int i, j, k;
i = j = k = 3;
i = k++;
So it seemed logical to me that the variable 'i' must have the value 4 now, since we assigned the increment of 'k' to it. In the multiple choice test, the correct values after the third line were instead:
k = 4
and
i != 4
Since we assigned the increment of k to i, how come the given solution is the exact opposite to what I had expected?
Thanks in advance!
Firstly, as noted by JB Nizet, don't do this. Very occasionally I'll use a postfix increment within another expression, for thing likes array[index++] = value; but very often I'll pull it out into two statements for clarity.
I wasn't going to answer this question, but all the answers (at the time of posting) make the same mistake: this isn't a matter of timing at all; it's a matter of the value of the expression k++.
The assignment to i happens after the increment of k - but the value of the expression k++ is the original value of k, not the incremented value.
So this code:
i = k++;
Is equivalent to:
int tmp = k;
k++;
i = tmp;
From section 15.14.2 of the JLS (emphasis mine):
[...] Otherwise, the value 1 is added to the value of the variable and the sum is stored back into the variable. Before the addition, binary numeric promotion (§5.6.2) is performed on the value 1 and the value of the variable. If necessary, the sum is narrowed by a narrowing primitive conversion (§5.1.3) and/or subjected to boxing conversion (§5.1.7) to the type of the variable before it is stored. The value of the postfix increment expression is the value of the variable before the new value is stored.
This difference is very important, and can easily be seen if instead of using the postfix expression for an assignment, you call a method:
public class Test {
private static int k = 0;
public static void main(String[] args) throws Exception {
foo(k++);
}
private static void foo(int x) {
System.out.println("Value of parameter: " + x);
System.out.println("Value of k: " + k);
}
}
The result is:
Value of parameter: 0
Value of k: 1
As you can see, k has already been incremented by the time we call the method, but the value passed into the method is still the original value.
#include <stdio.h>
int main(void)
{
int i = 0;
i = i++ + ++i;
printf("%d\n", i); // 3
i = 1;
i = (i++);
printf("%d\n", i); // 2 Should be 1, no ?
volatile int u = 0;
u = u++ + ++u;
printf("%d\n", u); // 1
u = 1;
u = (u++);
printf("%d\n", u); // 2 Should also be one, no ?
register int v = 0;
v = v++ + ++v;
printf("%d\n", v); // 3 (Should be the same as u ?)
int w = 0;
printf("%d %d\n", ++w, w); // shouldn't this print 1 1
int x[2] = { 5, 8 }, y = 0;
x[y] = y ++;
printf("%d %d\n", x[0], x[1]); // shouldn't this print 0 8? or 5 0?
}
C has the concept of undefined behavior, i.e. some language constructs are syntactically valid but you can't predict the behavior when the code is run.
As far as I know, the standard doesn't explicitly say why the concept of undefined behavior exists. In my mind, it's simply because the language designers wanted there to be some leeway in the semantics, instead of i.e. requiring that all implementations handle integer overflow in the exact same way, which would very likely impose serious performance costs, they just left the behavior undefined so that if you write code that causes integer overflow, anything can happen.
So, with that in mind, why are these "issues"? The language clearly says that certain things lead to undefined behavior. There is no problem, there is no "should" involved. If the undefined behavior changes when one of the involved variables is declared volatile, that doesn't prove or change anything. It is undefined; you cannot reason about the behavior.
Your most interesting-looking example, the one with
u = (u++);
is a text-book example of undefined behavior (see Wikipedia's entry on sequence points).
Most of the answers here quoted from C standard emphasizing that the behavior of these constructs are undefined. To understand why the behavior of these constructs are undefined, let's understand these terms first in the light of C11 standard:
Sequenced: (5.1.2.3)
Given any two evaluations A and B, if A is sequenced before B, then the execution of A shall precede the execution of B.
Unsequenced:
If A is not sequenced before or after B, then A and B are unsequenced.
Evaluations can be one of two things:
value computations, which work out the result of an expression; and
side effects, which are modifications of objects.
Sequence Point:
The presence of a sequence point between the evaluation of expressions A and B implies that every value computation and side effect associated with A is sequenced before every value computation and side effect associated with B.
Now coming to the question, for the expressions like
int i = 1;
i = i++;
standard says that:
6.5 Expressions:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. [...]
Therefore, the above expression invokes UB because two side effects on the same object i is unsequenced relative to each other. That means it is not sequenced whether the side effect by assignment to i will be done before or after the side effect by ++.
Depending on whether assignment occurs before or after the increment, different results will be produced and that's the one of the case of undefined behavior.
Lets rename the i at left of assignment be il and at the right of assignment (in the expression i++) be ir, then the expression be like
il = ir++ // Note that suffix l and r are used for the sake of clarity.
// Both il and ir represents the same object.
An important point regarding Postfix ++ operator is that:
just because the ++ comes after the variable does not mean that the increment happens late. The increment can happen as early as the compiler likes as long as the compiler ensures that the original value is used.
It means the expression il = ir++ could be evaluated either as
temp = ir; // i = 1
ir = ir + 1; // i = 2 side effect by ++ before assignment
il = temp; // i = 1 result is 1
or
temp = ir; // i = 1
il = temp; // i = 1 side effect by assignment before ++
ir = ir + 1; // i = 2 result is 2
resulting in two different results 1 and 2 which depends on the sequence of side effects by assignment and ++ and hence invokes UB.
I think the relevant parts of the C99 standard are 6.5 Expressions, §2
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
and 6.5.16 Assignment operators, §4:
The order of evaluation of the operands is unspecified. If an attempt is made to modify
the result of an assignment operator or to access it after the next sequence point, the
behavior is undefined.
Just compile and disassemble your line of code, if you are so inclined to know how exactly it is you get what you are getting.
This is what I get on my machine, together with what I think is going on:
$ cat evil.c
void evil(){
int i = 0;
i+= i++ + ++i;
}
$ gcc evil.c -c -o evil.bin
$ gdb evil.bin
(gdb) disassemble evil
Dump of assembler code for function evil:
0x00000000 <+0>: push %ebp
0x00000001 <+1>: mov %esp,%ebp
0x00000003 <+3>: sub $0x10,%esp
0x00000006 <+6>: movl $0x0,-0x4(%ebp) // i = 0 i = 0
0x0000000d <+13>: addl $0x1,-0x4(%ebp) // i++ i = 1
0x00000011 <+17>: mov -0x4(%ebp),%eax // j = i i = 1 j = 1
0x00000014 <+20>: add %eax,%eax // j += j i = 1 j = 2
0x00000016 <+22>: add %eax,-0x4(%ebp) // i += j i = 3
0x00000019 <+25>: addl $0x1,-0x4(%ebp) // i++ i = 4
0x0000001d <+29>: leave
0x0000001e <+30>: ret
End of assembler dump.
(I... suppose that the 0x00000014 instruction was some kind of compiler optimization?)
The behavior can't really be explained because it invokes both unspecified behavior and undefined behavior, so we can not make any general predictions about this code, although if you read Olve Maudal's work such as Deep C and Unspecified and Undefined sometimes you can make good guesses in very specific cases with a specific compiler and environment but please don't do that anywhere near production.
So moving on to unspecified behavior, in draft c99 standard section6.5 paragraph 3 says(emphasis mine):
The grouping of operators and operands is indicated by the syntax.74) Except as specified
later (for the function-call (), &&, ||, ?:, and comma operators), the order of evaluation of subexpressions and the order in which side effects take place are both unspecified.
So when we have a line like this:
i = i++ + ++i;
we do not know whether i++ or ++i will be evaluated first. This is mainly to give the compiler better options for optimization.
We also have undefined behavior here as well since the program is modifying variables(i, u, etc..) more than once between sequence points. From draft standard section 6.5 paragraph 2(emphasis mine):
Between the previous and next sequence point an object shall have its stored value
modified at most once by the evaluation of an expression. Furthermore, the prior value
shall be read only to determine the value to be stored.
it cites the following code examples as being undefined:
i = ++i + 1;
a[i++] = i;
In all these examples the code is attempting to modify an object more than once in the same sequence point, which will end with the ; in each one of these cases:
i = i++ + ++i;
^ ^ ^
i = (i++);
^ ^
u = u++ + ++u;
^ ^ ^
u = (u++);
^ ^
v = v++ + ++v;
^ ^ ^
Unspecified behavior is defined in the draft c99 standard in section 3.4.4 as:
use of an unspecified value, or other behavior where this International Standard provides
two or more possibilities and imposes no further requirements on which is chosen in any
instance
and undefined behavior is defined in section 3.4.3 as:
behavior, upon use of a nonportable or erroneous program construct or of erroneous data,
for which this International Standard imposes no requirements
and notes that:
Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).
Another way of answering this, rather than getting bogged down in arcane details of sequence points and undefined behavior, is simply to ask, what are they supposed to mean? What was the programmer trying to do?
The first fragment asked about, i = i++ + ++i, is pretty clearly insane in my book. No one would ever write it in a real program, it's not obvious what it does, there's no conceivable algorithm someone could have been trying to code that would have resulted in this particular contrived sequence of operations. And since it's not obvious to you and me what it's supposed to do, it's fine in my book if the compiler can't figure out what it's supposed to do, either.
The second fragment, i = i++, is a little easier to understand. Someone is clearly trying to increment i, and assign the result back to i. But there are a couple ways of doing this in C. The most basic way to add 1 to i, and assign the result back to i, is the same in almost any programming language:
i = i + 1
C, of course, has a handy shortcut:
i++
This means, "add 1 to i, and assign the result back to i". So if we construct a hodgepodge of the two, by writing
i = i++
what we're really saying is "add 1 to i, and assign the result back to i, and assign the result back to i". We're confused, so it doesn't bother me too much if the compiler gets confused, too.
Realistically, the only time these crazy expressions get written is when people are using them as artificial examples of how ++ is supposed to work. And of course it is important to understand how ++ works. But one practical rule for using ++ is, "If it's not obvious what an expression using ++ means, don't write it."
We used to spend countless hours on comp.lang.c discussing expressions like these and why they're undefined. Two of my longer answers, that try to really explain why, are archived on the web:
Why doesn't the Standard define what these do?
Doesn't operator precedence determine the order of evaluation?
See also question 3.8 and the rest of the questions in section 3 of the C FAQ list.
Often this question is linked as a duplicate of questions related to code like
printf("%d %d\n", i, i++);
or
printf("%d %d\n", ++i, i++);
or similar variants.
While this is also undefined behaviour as stated already, there are subtle differences when printf() is involved when comparing to a statement such as:
x = i++ + i++;
In the following statement:
printf("%d %d\n", ++i, i++);
the order of evaluation of arguments in printf() is unspecified. That means, expressions i++ and ++i could be evaluated in any order. C11 standard has some relevant descriptions on this:
Annex J, unspecified behaviours
The order in which the function designator, arguments, and
subexpressions within the arguments are evaluated in a function call
(6.5.2.2).
3.4.4, unspecified behavior
Use of an unspecified value, or other behavior where this
International Standard provides two or more possibilities and imposes
no further requirements on which is chosen in any instance.
EXAMPLE An example of unspecified behavior is the order in which the
arguments to a function are evaluated.
The unspecified behaviour itself is NOT an issue. Consider this example:
printf("%d %d\n", ++x, y++);
This too has unspecified behaviour because the order of evaluation of ++x and y++ is unspecified. But it's perfectly legal and valid statement. There's no undefined behaviour in this statement. Because the modifications (++x and y++) are done to distinct objects.
What renders the following statement
printf("%d %d\n", ++i, i++);
as undefined behaviour is the fact that these two expressions modify the same object i without an intervening sequence point.
Another detail is that the comma involved in the printf() call is a separator, not the comma operator.
This is an important distinction because the comma operator does introduce a sequence point between the evaluation of their operands, which makes the following legal:
int i = 5;
int j;
j = (++i, i++); // No undefined behaviour here because the comma operator
// introduces a sequence point between '++i' and 'i++'
printf("i=%d j=%d\n",i, j); // prints: i=7 j=6
The comma operator evaluates its operands left-to-right and yields only the value of the last operand. So in j = (++i, i++);, ++i increments i to 6 and i++ yields old value of i (6) which is assigned to j. Then i becomes 7 due to post-increment.
So if the comma in the function call were to be a comma operator then
printf("%d %d\n", ++i, i++);
will not be a problem. But it invokes undefined behaviour because the comma here is a separator.
For those who are new to undefined behaviour would benefit from reading What Every C Programmer Should Know About Undefined Behavior to understand the concept and many other variants of undefined behaviour in C.
This post: Undefined, unspecified and implementation-defined behavior is also relevant.
While it is unlikely that any compilers and processors would actually do so, it would be legal, under the C standard, for the compiler to implement "i++" with the sequence:
In a single operation, read `i` and lock it to prevent access until further notice
Compute (1+read_value)
In a single operation, unlock `i` and store the computed value
While I don't think any processors support the hardware to allow such a thing to be done efficiently, one can easily imagine situations where such behavior would make multi-threaded code easier (e.g. it would guarantee that if two threads try to perform the above sequence simultaneously, i would get incremented by two) and it's not totally inconceivable that some future processor might provide a feature something like that.
If the compiler were to write i++ as indicated above (legal under the standard) and were to intersperse the above instructions throughout the evaluation of the overall expression (also legal), and if it didn't happen to notice that one of the other instructions happened to access i, it would be possible (and legal) for the compiler to generate a sequence of instructions that would deadlock. To be sure, a compiler would almost certainly detect the problem in the case where the same variable i is used in both places, but if a routine accepts references to two pointers p and q, and uses (*p) and (*q) in the above expression (rather than using i twice) the compiler would not be required to recognize or avoid the deadlock that would occur if the same object's address were passed for both p and q.
While the syntax of the expressions like a = a++ or a++ + a++ is legal, the behaviour of these constructs is undefined because a shall in C standard is not obeyed. C99 6.5p2:
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. [72] Furthermore, the prior value shall be read only to determine the value to be stored [73]
With footnote 73 further clarifying that
This paragraph renders undefined statement expressions such as
i = ++i + 1;
a[i++] = i;
while allowing
i = i + 1;
a[i] = i;
The various sequence points are listed in Annex C of C11 (and C99):
The following are the sequence points described in 5.1.2.3:
Between the evaluations of the function designator and actual arguments in a function call and the actual call. (6.5.2.2).
Between the evaluations of the first and second operands of the following operators: logical AND && (6.5.13); logical OR || (6.5.14); comma , (6.5.17).
Between the evaluations of the first operand of the conditional ? : operator and whichever of the second and third operands is evaluated (6.5.15).
The end of a full declarator: declarators (6.7.6);
Between the evaluation of a full expression and the next full expression to be evaluated. The following are full expressions: an initializer that is not part of a compound literal (6.7.9); the expression in an expression statement (6.8.3); the controlling expression of a selection statement (if or switch) (6.8.4); the controlling expression of a while or do statement (6.8.5); each of the (optional) expressions of a for statement (6.8.5.3); the (optional) expression in a return statement (6.8.6.4).
Immediately before a library function returns (7.1.4).
After the actions associated with each formatted input/output function conversion specifier (7.21.6, 7.29.2).
Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any movement of the objects passed as arguments to that call (7.22.5).
The wording of the same paragraph in C11 is:
If a side effect on a scalar object is unsequenced relative to either a different side effect on the same scalar object or a value computation using the value of the same scalar object, the behavior is undefined. If there are multiple allowable orderings of the subexpressions of an expression, the behavior is undefined if such an unsequenced side effect occurs in any of the orderings.84)
You can detect such errors in a program by for example using a recent version of GCC with -Wall and -Werror, and then GCC will outright refuse to compile your program. The following is the output of gcc (Ubuntu 6.2.0-5ubuntu12) 6.2.0 20161005:
% gcc plusplus.c -Wall -Werror -pedantic
plusplus.c: In function ‘main’:
plusplus.c:6:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
i = i++ + ++i;
~~^~~~~~~~~~~
plusplus.c:6:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
plusplus.c:10:6: error: operation on ‘i’ may be undefined [-Werror=sequence-point]
i = (i++);
~~^~~~~~~
plusplus.c:14:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
u = u++ + ++u;
~~^~~~~~~~~~~
plusplus.c:14:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
plusplus.c:18:6: error: operation on ‘u’ may be undefined [-Werror=sequence-point]
u = (u++);
~~^~~~~~~
plusplus.c:22:6: error: operation on ‘v’ may be undefined [-Werror=sequence-point]
v = v++ + ++v;
~~^~~~~~~~~~~
plusplus.c:22:6: error: operation on ‘v’ may be undefined [-Werror=sequence-point]
cc1: all warnings being treated as errors
The important part is to know what a sequence point is -- and what is a sequence point and what isn't. For example the comma operator is a sequence point, so
j = (i ++, ++ i);
is well-defined, and will increment i by one, yielding the old value, discard that value; then at comma operator, settle the side effects; and then increment i by one, and the resulting value becomes the value of the expression - i.e. this is just a contrived way to write j = (i += 2) which is yet again a "clever" way to write
i += 2;
j = i;
However, the , in function argument lists is not a comma operator, and there is no sequence point between evaluations of distinct arguments; instead their evaluations are unsequenced with regard to each other; so the function call
int i = 0;
printf("%d %d\n", i++, ++i, i);
has undefined behaviour because there is no sequence point between the evaluations of i++ and ++i in function arguments, and the value of i is therefore modified twice, by both i++ and ++i, between the previous and the next sequence point.
The C standard says that a variable should only be assigned at most once between two sequence points. A semi-colon for instance is a sequence point.
So every statement of the form:
i = i++;
i = i++ + ++i;
and so on violate that rule. The standard also says that behavior is undefined and not unspecified. Some compilers do detect these and produce some result but this is not per standard.
However, two different variables can be incremented between two sequence points.
while(*src++ = *dst++);
The above is a common coding practice while copying/analysing strings.
In https://stackoverflow.com/questions/29505280/incrementing-array-index-in-c someone asked about a statement like:
int k[] = {0,1,2,3,4,5,6,7,8,9,10};
int i = 0;
int num;
num = k[++i+k[++i]] + k[++i];
printf("%d", num);
which prints 7... the OP expected it to print 6.
The ++i increments aren't guaranteed to all complete before the rest of the calculations. In fact, different compilers will get different results here. In the example you provided, the first 2 ++i executed, then the values of k[] were read, then the last ++i then k[].
num = k[i+1]+k[i+2] + k[i+3];
i += 3
Modern compilers will optimize this very well. In fact, possibly better than the code you originally wrote (assuming it had worked the way you had hoped).
Your question was probably not, "Why are these constructs undefined behavior in C?". Your question was probably, "Why did this code (using ++) not give me the value I expected?", and someone marked your question as a duplicate, and sent you here.
This answer tries to answer that question: why did your code not give you the answer you expected, and how can you learn to recognize (and avoid) expressions that will not work as expected.
I assume you've heard the basic definition of C's ++ and -- operators by now, and how the prefix form ++x differs from the postfix form x++. But these operators are hard to think about, so to make sure you understood, perhaps you wrote a tiny little test program involving something like
int x = 5;
printf("%d %d %d\n", x, ++x, x++);
But, to your surprise, this program did not help you understand — it printed some strange, inexplicable output, suggesting that maybe ++ does something completely different, not at all what you thought it did.
Or, perhaps you're looking at a hard-to-understand expression like
int x = 5;
x = x++ + ++x;
printf("%d\n", x);
Perhaps someone gave you that code as a puzzle. This code also makes no sense, especially if you run it — and if you compile and run it under two different compilers, you're likely to get two different answers! What's up with that? Which answer is correct? (And the answer is that both of them are, or neither of them are.)
As you've heard by now, these expressions are undefined, which means that the C language makes no guarantee about what they'll do. This is a strange and unsettling result, because you probably thought that any program you could write, as long as it compiled and ran, would generate a unique, well-defined output. But in the case of undefined behavior, that's not so.
What makes an expression undefined? Are expressions involving ++ and -- always undefined? Of course not: these are useful operators, and if you use them properly, they're perfectly well-defined.
For the expressions we're talking about, what makes them undefined is when there's too much going on at once, when we can't tell what order things will happen in, but when the order matters to the result we'll get.
Let's go back to the two examples I've used in this answer. When I wrote
printf("%d %d %d\n", x, ++x, x++);
the question is, before actually calling printf, does the compiler compute the value of x first, or x++, or maybe ++x? But it turns out we don't know. There's no rule in C which says that the arguments to a function get evaluated left-to-right, or right-to-left, or in some other order. So we can't say whether the compiler will do x first, then ++x, then x++, or x++ then ++x then x, or some other order. But the order clearly matters, because depending on which order the compiler uses, we'll clearly get a different series of numbers printed out.
What about this crazy expression?
x = x++ + ++x;
The problem with this expression is that it contains three different attempts to modify the value of x: (1) the x++ part tries to take x's value, add 1, store the new value in x, and return the old value; (2) the ++x part tries to take x's value, add 1, store the new value in x, and return the new value; and (3) the x = part tries to assign the sum of the other two back to x. Which of those three attempted assignments will "win"? Which of the three values will actually determine the final value of x? Again, and perhaps surprisingly, there's no rule in C to tell us.
You might imagine that precedence or associativity or left-to-right evaluation tells you what order things happen in, but they do not. You may not believe me, but please take my word for it, and I'll say it again: precedence and associativity do not determine every aspect of the evaluation order of an expression in C. In particular, if within one expression there are multiple different spots where we try to assign a new value to something like x, precedence and associativity do not tell us which of those attempts happens first, or last, or anything.
So with all that background and introduction out of the way, if you want to make sure that all your programs are well-defined, which expressions can you write, and which ones can you not write?
These expressions are all fine:
y = x++;
z = x++ + y++;
x = x + 1;
x = a[i++];
x = a[i++] + b[j++];
x[i++] = a[j++] + b[k++];
x = *p++;
x = *p++ + *q++;
These expressions are all undefined:
x = x++;
x = x++ + ++x;
y = x + x++;
a[i] = i++;
a[i++] = i;
printf("%d %d %d\n", x, ++x, x++);
And the last question is, how can you tell which expressions are well-defined, and which expressions are undefined?
As I said earlier, the undefined expressions are the ones where there's too much going at once, where you can't be sure what order things happen in, and where the order matters:
If there's one variable that's getting modified (assigned to) in two or more different places, how do you know which modification happens first?
If there's a variable that's getting modified in one place, and having its value used in another place, how do you know whether it uses the old value or the new value?
As an example of #1, in the expression
x = x++ + ++x;
there are three attempts to modify x.
As an example of #2, in the expression
y = x + x++;
we both use the value of x, and modify it.
So that's the answer: make sure that in any expression you write, each variable is modified at most once, and if a variable is modified, you don't also attempt to use the value of that variable somewhere else.
One more thing. You might be wondering how to "fix" the undefined expressions I started this answer by presenting.
In the case of printf("%d %d %d\n", x, ++x, x++);, it's easy — just write it as three separate printf calls:
printf("%d ", x);
printf("%d ", ++x);
printf("%d\n", x++);
Now the behavior is perfectly well defined, and you'll get sensible results.
In the case of x = x++ + ++x, on the other hand, there's no way to fix it. There's no way to write it so that it has guaranteed behavior matching your expectations — but that's okay, because you would never write an expression like x = x++ + ++x in a real program anyway.
A good explanation about what happens in this kind of computation is provided in the document n1188 from the ISO W14 site.
I explain the ideas.
The main rule from the standard ISO 9899 that applies in this situation is 6.5p2.
Between the previous and next sequence point an object shall have its stored value modified at most once by the evaluation of an expression. Furthermore, the prior value shall be read only to determine the value to be stored.
The sequence points in an expression like i=i++ are before i= and after i++.
In the paper that I quoted above it is explained that you can figure out the program as being formed by small boxes, each box containing the instructions between 2 consecutive sequence points. The sequence points are defined in annex C of the standard, in the case of i=i++ there are 2 sequence points that delimit a full-expression. Such an expression is syntactically equivalent with an entry of expression-statement in the Backus-Naur form of the grammar (a grammar is provided in annex A of the Standard).
So the order of instructions inside a box has no clear order.
i=i++
can be interpreted as
tmp = i
i=i+1
i = tmp
or as
tmp = i
i = tmp
i=i+1
because both all these forms to interpret the code i=i++ are valid and because both generate different answers, the behavior is undefined.
So a sequence point can be seen by the beginning and the end of each box that composes the program [the boxes are atomic units in C] and inside a box the order of instructions is not defined in all cases. Changing that order one can change the result sometimes.
EDIT:
Other good source for explaining such ambiguities are the entries from c-faq site (also published as a book) , namely here and here and here .
The reason is that the program is running undefined behavior. The problem lies in the evaluation order, because there is no sequence points required according to C++98 standard ( no operations is sequenced before or after another according to C++11 terminology).
However if you stick to one compiler, you will find the behavior persistent, as long as you don't add function calls or pointers, which would make the behavior more messy.
Using Nuwen MinGW 15 GCC 7.1 you will get:
#include<stdio.h>
int main(int argc, char ** argv)
{
int i = 0;
i = i++ + ++i;
printf("%d\n", i); // 2
i = 1;
i = (i++);
printf("%d\n", i); //1
volatile int u = 0;
u = u++ + ++u;
printf("%d\n", u); // 2
u = 1;
u = (u++);
printf("%d\n", u); //1
register int v = 0;
v = v++ + ++v;
printf("%d\n", v); //2
}
How does GCC work? it evaluates sub expressions at a left to right order for the right hand side (RHS) , then assigns the value to the left hand side (LHS) . This is exactly how Java and C# behave and define their standards. (Yes, the equivalent software in Java and C# has defined behaviors). It evaluate each sub expression one by one in the RHS Statement in a left to right order; for each sub expression: the ++c (pre-increment) is evaluated first then the value c is used for the operation, then the post increment c++).
according to GCC C++: Operators
In GCC C++, the precedence of the operators controls the order in
which the individual operators are evaluated
the equivalent code in defined behavior C++ as GCC understands:
#include<stdio.h>
int main(int argc, char ** argv)
{
int i = 0;
//i = i++ + ++i;
int r;
r=i;
i++;
++i;
r+=i;
i=r;
printf("%d\n", i); // 2
i = 1;
//i = (i++);
r=i;
i++;
i=r;
printf("%d\n", i); // 1
volatile int u = 0;
//u = u++ + ++u;
r=u;
u++;
++u;
r+=u;
u=r;
printf("%d\n", u); // 2
u = 1;
//u = (u++);
r=u;
u++;
u=r;
printf("%d\n", u); // 1
register int v = 0;
//v = v++ + ++v;
r=v;
v++;
++v;
r+=v;
v=r;
printf("%d\n", v); //2
}
Then we go to Visual Studio. Visual Studio 2015, you get:
#include<stdio.h>
int main(int argc, char ** argv)
{
int i = 0;
i = i++ + ++i;
printf("%d\n", i); // 3
i = 1;
i = (i++);
printf("%d\n", i); // 2
volatile int u = 0;
u = u++ + ++u;
printf("%d\n", u); // 3
u = 1;
u = (u++);
printf("%d\n", u); // 2
register int v = 0;
v = v++ + ++v;
printf("%d\n", v); // 3
}
How does Visual Studio work, it takes another approach, it evaluates all pre-increments expressions in first pass, then uses variables values in the operations in second pass, assign from RHS to LHS in third pass, then at last pass it evaluates all the post-increment expressions in one pass.
So the equivalent in defined behavior C++ as Visual C++ understands:
#include<stdio.h>
int main(int argc, char ** argv)
{
int r;
int i = 0;
//i = i++ + ++i;
++i;
r = i + i;
i = r;
i++;
printf("%d\n", i); // 3
i = 1;
//i = (i++);
r = i;
i = r;
i++;
printf("%d\n", i); // 2
volatile int u = 0;
//u = u++ + ++u;
++u;
r = u + u;
u = r;
u++;
printf("%d\n", u); // 3
u = 1;
//u = (u++);
r = u;
u = r;
u++;
printf("%d\n", u); // 2
register int v = 0;
//v = v++ + ++v;
++v;
r = v + v;
v = r;
v++;
printf("%d\n", v); // 3
}
as Visual Studio documentation states at Precedence and Order of Evaluation:
Where several operators appear together, they have equal precedence and are evaluated according to their associativity. The operators in the table are described in the sections beginning with Postfix Operators.