Why would something simple like this:
System.out.println("test problem: " + 194*194*194*409);
output something like this:
test problem: -1308701240
Because you've overflowed an integer. See the limits on the numbers handled by integers in the java documentation.
Rather than give you the direct answer, I'll suggest some steps.
Work out the type of arithmetic you're doing (what type is the literal 194? what type is the result of the multiplication operator you're using?)
What do you expect the answer to be?
Can your result type handle that answer?
What does Java do for results it can't handle?
For bonus points, refer to the Java Language Specification for the last part...
From 15.17.1. Multiplication Operator
If an integer multiplication overflows, then the result is the low-order bits of the
mathematical product as represented in some sufficiently large two's-complement format. As a result, if overflow occurs, then the sign of the result may not be the same as the sign of the mathematical product of the two operand values.
We have 194*194*194*409=2986266056
In binary the result is 1011 0001 1111 1110 1100 1101 1100 1000
The last 32 bits are 1011 0001 1111 1110 1100 1101 1100 1000, so we did not lose any bits by the overflow, but the sign has changed.
Since int is represented in two's complement, (Source), the results is
2^32 - 2986266056 = -1308701240.
Everything works as expected!
Looks like an overflow on the result of the operation. 2.986.266.056 not a valid int value.
Try to make first the mathematical OP and then convert the result to string.
Related
This question already has answers here:
How are integers cast to bytes in Java?
(8 answers)
Type casting into byte in Java
(6 answers)
Explicit conversion from int to byte in Java
(4 answers)
Closed 4 months ago.
For example of this is my input:
byte x=(byte) 200;
This will be the output:
-56
if this is my input:
short x=(short) 250000;
This will be the output:
-12144
I realize that the output is off because the number does not fit into the datatype, but how can I predict what this output will be in this case? In my computer science exam this my be one of the questions and I do not understand why exactly 200 changes to -56 and so one.
I realize that the output is off because the number does not fit into the datatype, but how can I predict what this output will be in this case? In my computer science exam this my be one of the questions and I do not understand why exactly 200 changes to -56 and so one.
The relevant aspects are what overflow looks like, and how the bits that represent the underlying data are treated.
Computers are all bits, grouped together in groups of 8; a group of 8 bits is called a byte.
byte b = 5; for example, is stored in memory as 0000 0101.
Bits can be 0. Or 1. That's it. That's where it ends. And everything is, in the end, bits. This means: That - is not a thing. Computers do not know what - is and cannot store it. We need to write code and agree on some sort of meaning to represent them.
2's complement
So what's -5 in bits? It's 1111 1011. Which seems bizarre. But it's how it works. If you write: byte b = -5;, then b will contain 1111 1011. It is because javac made that happen. Similarly, if you then call System.out.println(b), then the println method gets the bit sequence 1111 1011. Why does the println method decide to print a - symbol and then a 5 symbol? Because it's programmed that way: We all are in agreement that 1111 1011 is -5. So why is that?
Because of a really cool property - signed/unsigned irrelevancy.
The rule is 2's complement: To switch the sign (i.e. turn 5, which is 0000 0101 into -5 which is 1111 1011), you flip every bit, and then add 1 to the end result. Try it with 0000 0101 - and you'll see it's 1111 1011. This algorithm is reversible - apply the same algorithm (flip every bit, then add 1) and you can turn -5 into 5.
This 2's complement thing has 2 great advantages:
There is only one 0 value. If we just flipped all bits, we'd have both 1111 1111 and 0000 0000 both representing some form of 0. In basic math, there's no such thing as 'negative 0' - it's the same as positive 0. Similarly if we just decided the first bit is the sign and the remaining 7 bits are the number, then we'd have both 1000 0000 and 0000 0000 both being 0, which is annoying and inefficient, why waste 2 different bit sequences on the same number?
plus and minus are sign-mode independent. The computer doesn't have to KNOW whether we are doing the 2's complement thing or not. Take the bit sequence 1111 1011. If we treat that as unsigned bits, then that is 251 (it's 128 + 64 + 32 + 16 + 8 + 2 + 1). If we treat that as a signed number, then the first bit is 1, so the thing is negative: We apply 2's complement and figure out that it is -5. So, is it -5 or 251? It's both, at once! Depends on the human/code that interprets this bit sequence which one it is. So how could the computer possibly do a + b given this? The weird answer is: It doesn't matter - because the math works out the same way. 251 - 10 is 241. -5 - 10 is -15. -15 and 241 are the exact same bit sequence.
Overflow
A byte is 8 bits, and there are 256 different sequences of bits, and then you have listed each and every possible variant. (2^8 = 256. Hence, a 16-bit number can be used to convey 65536 different things, because 2^16 is 65536, and so on). So, given that bytes are 8 bits and we decreed they are signed, and 2's complement signed, that means that the smallest number you can send with it is -128, which in bits is 1000 0000 (use 2's complement to check my work), and +127, which in bits is 0111 1111. So what happens if you add 1 to 127? That'd seemingly be +128 except that's not storable in 8 bits if we decree that we interpret these bits as 2's complement signed (which java does). What happens? The bits 'roll over'. We just add 1 as normal, which turns 0111 1111 into 1000 0000 which is -128:
byte b = 127;
b = (byte)(b + 1);
System.out.println(b); // prints -128
Imagine the number line - stretching out into infinity on both ends, from -infinite to +infinite. That's the usual way math works. Computers (or rather, int, long, etc) do not work like that. Instead of a line, it is a circle. Take your infinite number line and take some scissors, and snip that number line at -128 (because a 2's comp signed byte cannot represent -129 or anything else below -128), and at +127 (because our byte cannot represent 128 or anything above it).
And now tape the 2 cut ends together.
That's the number line. What's 'to the right' of 125? 126 - that's what +1 means: Move one to the right on the number line.
What's 'to the right' of +127? Why, -128. Because we taped it together.
Similarly, -127 - 5 is +123. '-5' is 'move 5 places to the left on the number line (or rather, number circle)'. Going in 1 decrements:
-127 (we start here)
-128 (-127 -1)
+127 (-127 -2)
+126 (-127 -3)
+125 (-127 -4)
+124 (-127 -5)
Hence, 124.
Same math applies to short (-32768 to +32767), char (which is really a 16-bit unsigned number - so 0 to 65535), int (-2147483648 to +2147483647), and even long (-2^63 to +2^63-1 - those get a little large).
short x = 32765;
x += 5;
System.out.println(x); // prints -32766.
I am trying to perform a bitwise not on a 128 bit BigInteger in Java. I have a 128 bit number which has the first 64 bits set to 1 and the last 64 bits set to 0 (I am playing with IPv6 masks).
BigInteger b = new BigInteger(2).pow(64).subtract(BigInteger.ONE).shiftLeft(64);
System.out.println(b.toString(2));
This results in the following, if I output it using base 2:
11111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000
I am trying to flip/reverse all the bits using a bitwise not.
System.out.println(b.not().toString(2));
From my understanding of a bitwise not, I was expecting all the 1's to change to 0's and all the 0's to change to 1's, but I get the following instead:
-11111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000000
This also seems to match the documentation of the not() function:
This method returns a negative value if and only if this BigInteger is non-negative
Is it a case of looping through all 128 bits and performing a bitwise not on each separate bit instead?
UPDATE
It probably helps if I try and explain what I was trying to achieve to give some context. I am manipulating IPv6 addresses and was trying to determine if a given IPv6 address was within a subnet or not based on an IPv6 mask.
Based on the responses, I think the following should work:
E.g.
Is 2001:db8:0:0:8:800:200c:417b within 2001:db8::/64?
BigInteger n = new BigInteger(1, InetAddress.getByName("2001:db8::").getAddress());
BigInteger b = BigInteger.ONE.shiftLeft(64).subtract(BigInteger.ONE).shiftLeft(64);
// First Address in Subnet
BigInteger first = n.and(b);
// Last Address in Subnet (this is where I was having a problem as it was returning a negative number)
BigInteger MASK_128 = BigInteger.ONE.shiftLeft(128).subtract(BigInteger.ONE);
BigInteger last = first.add(b.xor(MASK_128));
// Convert our test IP into BigInteger
BigInteger ip = new BigInteger(1, InetAddress.getByName("2001:db8:0:0:8:800:200c:417b").getAddress());
// Check if IP is >= first and <= last
if ((first.compareTo(ip) <= 0) && (last.compareTo(ip) >= 0)) {
// in subnet
}
As others have pointed out, it's the sign bit that is giving you the result that you don't want.
There are a couple of ways that you can get the inverted bits. For both of them you will need a 128 bit mask value:
private static final BigInteger MASK_128 =
BigInteger.ONE.shiftLeft(128).subtract(BigInteger.ONE);
Then you can either mask the sign bit:
BigInteger b = BigInteger.valueOf(2).pow(64).subtract(BigInteger.ONE).shiftLeft(64);
System.out.println(MASK_128.andNot(b).toString(2));
Or invert directly using xor:
System.out.println(b.xor(MASK_128).toString(2));
I expect the mask value will be useful elsewhere once you start fleshing things out also.
signed byte 64 = 01000000
Invert it
and we get signed byte -65 = 10111111
Sign "minus" is the "not" operator itself:
-1000000 = 0111111
Type this and you see that the values are equal (+1) in absolute
System.out.println(b.toString());
System.out.println(b.not().toString());
Your answer is perfectly correct ,In java everything is in two's compliment:
Convert Decimal to Two's Complement
Convert the number to binary (ignore the sign for now) e.g. 5 is 0101 and -5 is 0101
If the number is a positive number then you are done. e.g. 5 is 0101 in binary using twos complement notation.
Here goes your solution.
If the number is negative then
3.1 find the complement (invert 0's and 1's) e.g. -5 is 0101 so finding the complement is 1010
3.2 Add 1 to the complement 1010 + 1 = 1011. Therefore, -5 in two's complement is 1011.
So, what if you wanted to do 2 + (-3) in binary? 2 + (-3) is -1. What would you have to do if you were using sign magnitude to add these numbers? 0010 + 1101 = ?
Using two's complement consider how easy it would be.
2 = 0010
-3 = 1101
+
-1 = 1111
Converting Two's Complement to Decimal
Converting 1111 to decimal:
The number starts with 1, so it's negative, so we find the complement of 1111, which is 0000.
Add 1 to 0000, and we obtain 0001.
Convert 0001 to decimal, which is 1.
Apply the sign = -1.
In your case
when you do b.not().toString(2) , you will get the response :
-11111111111111111111111111111111111111111111111111111111111111110000000000000000000000000000000000000000000000000000000000000001
with 1 at the last bit.
Now do the two's compliment and you will get the right answer.
e.g; Flip all the 1's by 0's and vice-versa. after doing that add one to the solution and you will get the solution that you are seeking.
Final solution
00000000000000000000000000000000000000000000000000000000000000001111111111111111111111111111111111111111111111111111111111111111
Use
VALUE = 0xFFFFFFFFFFFFFFFF0000000000000000 - this is patern for bit 1...10...0
VALUE = 0x0000000000000000FFFFFFFFFFFFFFFF - this is patern for bit 0...01...1
Just type by hand 16 x "F" and "0" and remember to add "0x"before each pattern.
Define this values as final.
If You want to generate such values you must do
value += 0x1;
value << 1;
For n times. In your case n is 64.
I need to translate this line of code in Java and I am
not sure what to do about ptrdiff_t. Not sure what it does here. By the way, mask_block
is of type size_t.
size_t lowest_bit = mask_block & (-(ptrdiff_t)mask_block);
Thanks
Beware! This is bit magic!
( x & ~(x-1) ) returns the lowest set bit in an expression. The author of the original code decided to use ( x & (-x) ) which is effectively the same due to the two's comlement representation of integers. But (the original author thought that) to get -x you need to use signed types and, as pointed out earlier, ptrdiff_t is signed, size_t is unsigned.
As Java does not have unsigned types, mask_block will be int and mask_block & (-mask_block) will work without any issue.
Note that due to the interoperability between signed and unsigned types, the cast is superfluous in C++ as well.
ptrdiff_t is the type that should be used for the (integer) difference between two pointers. That is, the result of subtracting one pointer from another. It is a signed integer, and should be large enough to stroe the size of largest possible array (so in Java, that would simply be an int, I'd guess)
ptrdiff_t is the name of a type, like int or ::std::string. The C++ standard promises that this type will be an integer type large enough to hold the difference between any two pointers that you can subtract. Of course, the idea of subtracting pointers is a rather foreign concept in Java. In order to be able to do this, ptrdiff_t must be able to hold negative numbers.
The sub-expression in which ptrdiff_t is used is a cast expression, like a Java typecast. However, unlike a Java typecast, C++ cast expressions are more dangerous and uglier. They can be used for all kinds of different type conversions that Java would balk at. And sometimes they will yield surprising results.
In this case, it looks like someone needed a value which was an unsigned integer of some kind (maybe an unsigned long or something) to be able to be negative. They needed to turn it into a signed value. ptrdiff_t is typically the largest size integer the platform supports. So if you're going to turn an arbitrary unsigned integer type into a signed one ptrdiff_t would be the type to use that would be least likely to result in some kind of odd truncation or sign change with C++'s rather ugly cast operation.
In particular, it looks like the type they wanted was size_t, which is another type in the C++ standard. It is an unsigned type (just like I was guessing), and is guaranteed to be an integer type that's big enough to hold the size of any possible object in memory. It's usually the same size as ptrdiff_t.
The reason the person who wrote the code wanted to do this was an interesting bit manipulation trick. To show you the trick, I'll show you how this expression plays out in a number of scenarios.
Suppose mask_block is 48. Lets say that on this hypothetical platform, size_t is 16 bits (which is very small, but this is just an example). In binary then, mask_block looks like this:
0000 0000 0011 0000
And -(ptrdiff_t)mask_block is -48, which looks like this:
1111 1111 1101 0000
So, 48 & -48 is this:
0000 0000 0001 0000
Which is 16. Notice that this is the value of the lowest set bit in 48. Lets try 50. 50 looks like this:
0000 0000 0011 0010
And -50 looks like this:
1111 1111 1100 1110
So, 50 & -50 looks like this:
0000 0000 0000 0010
Which is 2. Notice again how this is the value of the lowest set bit in 50.
So this is just a trick to find the value of the lowest set bit in mask. The fact the variable is called lowest_bit should be a clue there. :-)
Of course, this trick isn't completely portable. Some platforms that C and (maybe C++ by now) run on do not use twos complement representation, and this trick won't work on those platforms.
In Java, you can just do this long lowest_bit = mask_block & -mask_block; and get the same effect. Java guarantees twos complement integers and doesn't even have unsigned integers. So it should work just fine.
x & -x is a bit hack that clears all bits of x excluding its lowest bit.
For all non-zero values of x, it is 1 << lb, where lb is the position of the least significant bit (counting starting with 0).
Why is it casted to ptrdiff_t? Without further knowledge it is difficult to say. I'm not even sure that the cast is needed. ptrdiff_t is guaranteed to be a signed integral type and size_t is always an unsigned integral type. So, I guess that the author of the C++ code wanted to be sure that it is signed and has the same size as a pointer. It should be sufficient to port the code to Java by simply ignoring the cast, as in Java all integers are signed anyway.
The resulting code will also be more portable than the original C/C++ version, which assumes that the machine uses 2's complement to represent integers, although it is (at least in theory) not guaranteed by the C or C++ standard. In Java, however, it is guaranteed that the JVM must use 2's complement.
I have been reading about hashcode functions for the past couple of hours and have accumulated a couple of questions regarding use of prime numbers as multipliers in custom hashcode implementations. I would be grateful if I could get some insight regarding following questions:
In a comment to #mattb's answer here, #hstoerr advocates for use of larger primes (such as 524287) instead of the common prime 31. My question is, given the following implementation of a hashcode functions for a pair or elements:
#Override
public int hashCode() {
final int prime = 31;
int hash1 = (pg1 == null) ? 0 : pg1.hashCode();
int hash2 = (pg2 == null) ? 0 : pg2.hashCode();
return prime * (hash1 ^ hash2);
}
doesn't this lead to an overflow on the returned int if prime is a large number?
Assuming that the overflow is not a problem (JVM doing an automatic cast) is it better to do a bitshift instead of a cast?
I imagine the performance of the hashcode function vary significantly based on the complexity of the hashcode. Does the size of the prime multiplier not effect the performance?
Is it better/smarter/faster to use multiple primes in a custom hashcode function instead of a single multiplier? If not, is there some other advantage? See the example below from #jinguy's answer to a relevant question:
public int hashCode() {
return a * 13 + b.hashCode() * 23 + (c? 31: 7);
}
where a is an int, b is a String and c is boolean.
How about something like long lhash = prime * (hash1 ^ hash2); then using (int)((lhash >> 32) ^ lhash)? That's something I saw on another question here SO, but it wasn't really explained why it was a good idea to do it like that.
Apologies in advance for the novel. Feel free to make suggestions or edit directly. --Chet
There is an overflow, but not an exception.
The danger doesn't come from losing accuracy, but losing range. Let's use a ridiculous example, where "prime" is a large power of 2, and 8-bit unsigned numbers for brevity. And assume that (hash1 ^ hash2) is 255:
"prime": 1000 0000
(hash1 ^ hash2): 1111 1111
Showing the truncated digits in brackets, our result is:
product: [0111 1111] 1000 0000
But multiplying by 128 is the same as shifting left by 7 places. So we know that whatever the value of (hash1 ^ hash2), the least-significant places of the product will have seven zeros. So if (hash1 ^ hash2) is odd (least significant bit = 1), then the result of multiplying by 128 will always be 128 (after truncating the higher digits). And if (hash1 ^ hash2) is even (LSB is 0, then the product will always be zero.
This extends to larger bit sizes. The general point is that if the lower bits of "prime" are zeros, you're doing a shift (or multiple shift + sum) operation that will give you zeros in the lower bits. And the range of the product of multiplication will suffer.
But let's try making "prime" odd, so that the least significant bit will always be 1. Think about decomposing this into shift / add operations. The unshifted value of (hash1 ^ hash2) will always be one of the summands. The least significant bits that were shifted into guaranteed uselessness by an even "prime" multiplier will now be set based on, at minimum, the bits from the original (hash1 ^ hash2) value.
Now, let's consider a value of prime which is actually prime. If it's more than 2, then we know it's odd. So the lower bits haven't been shifted into uselessness. And by choosing a sufficiently large prime, you get better distribution across the range of output values than you'd get with a smaller prime.
Try some exercises with 16-bit multiplication using 8443 (0010 0000 1111 1011) and 59 (0000 0000 0011 1011). They're both prime, and the lower bits of 59 match the lower bits of 65531. For example, if hash1 and hash2 are both ASCII character values (0 .. 255), then all of the results of (hash1 ^ hash2) * 59 will be <= 15045. This means that roughly 1/4 of the range of hash values (0..65535) for a 16-bit number go unused.
But (hash1 ^ hash2) * 8443 is all over the map. It overflows if (hash1 ^ hash2) is as low as 8. It uses all 16 bits even for very small input numbers. There's much less clustering of hash values across the overall range, even if the input numbers are in a relatively small range.
Assuming that the overflow is not a problem (JVM doing an automatic cast) is it better to do a bitshift instead of a cast?
Most likely not. The JVM should translate into an efficient implementation on the host processor anyway. Integer multiplication should be implemented in hardware. And if not, the JVM is responsible for translating the operation into something reasonable for the CPU. It's very likely that the case of integer multiplication is highly optimized already. If integer multiplication is done more quickly on a given CPU as shift-and-add, the JVM should implement it that way. But it's less likely that the folks writing the JVM would care to watch for cases where multiple shift-and-add operations could have been combined into a single integer multiply.
I imagine the performance of the hashcode function vary significantly based on the complexity of the hashcode. Does the size
of the prime multiplier not effect the performance?
No. The operations are the same when done in hardware regardless of the size, number of bits set, etc. It's probably a couple of clock cycles. It would vary depending on the specific CPU, but should be a constant-time operation regardless of the input values.
Is it better/smarter/faster to use multiple primes in a custom hashcode function instead of a single multiplier? If not, is there
some other advantage?
Only if it reduces the possibility of collisions, and this depends on the numbers you're using. If your hash code depends on A and B and they're in the same range, you might consider using different primes or shifting one of the input values to reduce overlap between the bits. Since you're depending on their individual hash codes, and not their values directly, it's reasonable to assume that their hash codes provide good distribution, etc.
One factor that comes to mind whether you want the hash code for (x, y) to be different from (y, x). If your hash function treats A and B in the same way, then hash(x, y) = hash(y, x). If that's what you want, then by all means use the same multiplier. It not, using a different multiplier would make sense.
How about something like long lhash = prime * (hash1 ^ hash2); then using (int)((lhash >> 32) ^ lhash)? That's something I saw on another question here SO, but it wasn't really explained why it was a good idea to do it like that.
Interesting question. In Java, longs are 64-bit and ints are 32-bit. So this generates a hash using twice as many bits as desired, and then derives the result from the high and low bits combined.
If multiplying a number n by a prime p, and the lowermost k bits of n are all zeros, then the lowermost k bits of the product n * p will also be all zeros. This is fairly easy to see -- if you're multiplying, say, n = 0011 0000 and p = 0011 1011, then the product can be expressed as the sum of two shift operations. Or,
00110000 * p = 00100000 * p + 00010000 * p
= p << 5 + p << 4
Taking p = 59 and using unsigned 8-bit ints and 16-bit longs, here are some examples.
64: 0011 1011 * 0100 0000 = [ 0000 1110 ] 1100 0000 (192)
128: 0011 1011 * 1000 0000 = [ 0001 1101 ] 1000 0000 (128)
192: 0011 1011 * 1100 0000 = [ 0010 1100 ] 0100 0000 (64)
By just dropping the high bits of the result, the range of the resulting hash value is limited when the low bits of the non-prime multiplicand are all zeros. Whether that's an issue in a specific context is, well, context-specific. But for a general hash function it's a good idea to avoid limiting the range of output values even when there are patterns in the input numbers. And in security applications, it's even more critical to avoid anything that would let someone make inferences about the original value based on patterns in the output. Just taking the low bits reveals the exact values of some of the original bits. If we make the assumption that the operation involved multiplying an input number with a large prime, then we know that the original number had as many zeros at the right as the hash output (because the prime's rightmost bit was 1).
By XORing the high bits with the low bits, there's less consistency in the output. And more importantly, it's much harder to make guesses about the input values based on this information. Based on how XOR works, it could mean the original low bit was 0 and the high bit was 1, or the original low bit was 1 and the high bit was 0.
64: 0011 1011 * 0100 0000 = 0000 1110 1100 0000 => 1100 1110 (206)
128: 0011 1011 * 1000 0000 = 0001 1101 1000 0000 => 1001 1101 (157)
192: 0011 1011 * 1100 0000 = 0010 1100 0100 0000 => 0110 1100 (204)
Overflow is not a problem. Hashes are constrained to a narrow value set anyway.
The first hash function you posted isn't very good. Doing return (prime * hash1) ^ hash2;
` instead would reduce the number of collisions in most cases.
Multiplying by a single word int is generally very fast, and the difference between multiplying by different numbers is negligible. Plus the execution time is dwarfed by everything else in the function anyay
Using different prime multipliers for each part may reduce the risk of collisions.
Consider this snippet from the Java language specification.
class Test {
public static void main(String[] args) {
int i = 1000000;
System.out.println(i * i);
long l = i;
System.out.println(l * l);
}
}
The output is
-727379968
1000000000000
Why is the result -727379968 for (i*i)? Ideally it should be 1000000000000.
I know the range of Integer is from –2147483648 to 2147483647. so obviously 1000000000000
is not in the given range.
Why does the result become -727379968?
Java (like most computer architectures these days) uses something called two's complement arithmetic, which uses the most significant bit of an integer to signify that a number is negative. If you multiply two big numbers, you end up with a number that's so big it sets that highest bit, and the result ends up negative.
Lets look at the binary:
1000000 is 1111 0100 0010 0100 0000.
1000000000000 is 1110 1000 1101 0100 1010 0101 0001 0000 0000 0000
However, the first two sections of 4 bits won't fit in an int (since int is 32-bits wide in Java,) and so they are dropped, leaving only 1101 0100 1010 0101 0001 0000 0000 0000, which is -727379968.
In other words, the result overflows for int, and you get what's left.
You might want to check Integer overflow as a general concept.
Overflow and underflow are handled differently depending on the language, too. Here is an article on Integer overflow and underflow in Java.
As for the reason why this is so in the Java language, as always, it's a tradeoff between simplicity in the language design and performance. But in Java puzzlers (puzzle 3), the authors criticize the fact that overflows are silent in Java:
The lesson for language designers is that it may be worth reducing the
likelihood of silent overflow. This could be done by providing support
for arithmatic that does not overflow silently. Programs could throw
an exception instead of overflowing, as does Ada, or they could switch
to a larger internal representation automatically as required to avoid
overflow, as does Lisp. Both of these approaches may have performance
penalties associated with them. Another way to reduce the likelyhood
of silent overflow is to support target typing, but this adds
significant complexity to the type system.
Some of the other answers explain correctly why this is happening (ie. signed two's compliment binary logic).
The actual solution to the problem and how to get the correct answer in Java when using really big numbers is to use the BigInteger class, which also works for long values.
package com.craigsdickson.scratchpad;
import java.math.BigInteger;
public class BigIntegerExample {
public static void main(String[] args) {
int bigInt = Integer.MAX_VALUE;
// prints incorrect answer
System.out.println(bigInt * bigInt);
BigInteger bi = BigInteger.valueOf(bigInt);
// prints correct answer
System.out.println(bi.multiply(bi));
long bigLong = Long.MAX_VALUE;
// prints incorrect answer
System.out.println(bigLong * bigLong);
BigInteger bl = BigInteger.valueOf(bigLong);
// prints correct answer
System.out.println(bl.multiply(bl));
}
}
The reasons why integer overflow occurs have already been explained in other answers.
A practical way to ensure long arithmetic in calculations is to use numeric literals with l suffix that declare the literals as long.
Ordinary integer multiplication that overflows:
jshell> 100000 * 100000
$1 ==> -727379968
Multiplication where one of the multiplicands has l suffix that does not overflow:
jshell> 100000 * 100000l
$1 ==> 1000000000000
Note that longs are also prone to overflow, but the range is much greater, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.