Multiplication of two ints overflowing to result in a negative number - java

Consider this snippet from the Java language specification.
class Test {
public static void main(String[] args) {
int i = 1000000;
System.out.println(i * i);
long l = i;
System.out.println(l * l);
}
}
The output is
-727379968
1000000000000
Why is the result -727379968 for (i*i)? Ideally it should be 1000000000000.
I know the range of Integer is from –2147483648 to 2147483647. so obviously 1000000000000
is not in the given range.
Why does the result become -727379968?

Java (like most computer architectures these days) uses something called two's complement arithmetic, which uses the most significant bit of an integer to signify that a number is negative. If you multiply two big numbers, you end up with a number that's so big it sets that highest bit, and the result ends up negative.

Lets look at the binary:
1000000 is 1111 0100 0010 0100 0000.
1000000000000 is 1110 1000 1101 0100 1010 0101 0001 0000 0000 0000
However, the first two sections of 4 bits won't fit in an int (since int is 32-bits wide in Java,) and so they are dropped, leaving only 1101 0100 1010 0101 0001 0000 0000 0000, which is -727379968.
In other words, the result overflows for int, and you get what's left.

You might want to check Integer overflow as a general concept.
Overflow and underflow are handled differently depending on the language, too. Here is an article on Integer overflow and underflow in Java.
As for the reason why this is so in the Java language, as always, it's a tradeoff between simplicity in the language design and performance. But in Java puzzlers (puzzle 3), the authors criticize the fact that overflows are silent in Java:
The lesson for language designers is that it may be worth reducing the
likelihood of silent overflow. This could be done by providing support
for arithmatic that does not overflow silently. Programs could throw
an exception instead of overflowing, as does Ada, or they could switch
to a larger internal representation automatically as required to avoid
overflow, as does Lisp. Both of these approaches may have performance
penalties associated with them. Another way to reduce the likelyhood
of silent overflow is to support target typing, but this adds
significant complexity to the type system.

Some of the other answers explain correctly why this is happening (ie. signed two's compliment binary logic).
The actual solution to the problem and how to get the correct answer in Java when using really big numbers is to use the BigInteger class, which also works for long values.
package com.craigsdickson.scratchpad;
import java.math.BigInteger;
public class BigIntegerExample {
public static void main(String[] args) {
int bigInt = Integer.MAX_VALUE;
// prints incorrect answer
System.out.println(bigInt * bigInt);
BigInteger bi = BigInteger.valueOf(bigInt);
// prints correct answer
System.out.println(bi.multiply(bi));
long bigLong = Long.MAX_VALUE;
// prints incorrect answer
System.out.println(bigLong * bigLong);
BigInteger bl = BigInteger.valueOf(bigLong);
// prints correct answer
System.out.println(bl.multiply(bl));
}
}

The reasons why integer overflow occurs have already been explained in other answers.
A practical way to ensure long arithmetic in calculations is to use numeric literals with l suffix that declare the literals as long.
Ordinary integer multiplication that overflows:
jshell> 100000 * 100000
$1 ==> -727379968
Multiplication where one of the multiplicands has l suffix that does not overflow:
jshell> 100000 * 100000l
$1 ==> 1000000000000
Note that longs are also prone to overflow, but the range is much greater, from -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807.

Related

Often big numbers become negative

Since I started using eclipse for project euler, I noticed that big numbers sometime become a seemingly random negative numbers. I suppose this has something to do with passing the boudry of the type.
I'll be glad if you could explain to me how these negative numbers are generated and what is the logic behind it. Also, how can I avoid them (preferable not with BigInteger class). Danke!=)
This image shows what you're looking for. In your case it's obviously larger numbers, but the principle stays the same.
Examples of limits in java are:
int: −2,147,483,648 to 2,147,483,647.
long: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807
In the image 0000, 0001 etc, shows the binary representation of the numbers.
EDIT: In project euler you often have to think of a way to work around the lagre numbers. The problems are designed with numbers that big so that you can't use the ordinary way of problem solving. However, if you find that you really need to use them, i suggest studying BigInteger anyway. You will find it useful in the long run, and it's not all that complicated. Here is a link with lots of understandable examples:
BigInteger Example
In mathematics numbers are infinite. However in computers they are not. There is MAX_VALUE for each int-like type: int, short, long. For example Integer.MAX_VALUE. When you try to increase number more than this value the number becomes negative. This way the internal binary representation of numbers work.
int i = Integer.MAX_VALUE;
i++; // i becomes negative.
Here's a two's complement representation for 2-bit integer: (U means Unsigned, S means Signed)
U | bits | S
---------------
0 | 00 | 0
1 | 01 | 1 \ overflow here:
2 | 10 | -2 / 1 + 1 = -2
3 | 11 | -1
Arithmetic is done mostly like in the unsigned case, modulo max(U) (4 in our case).
The logic is the same for bigger types. int in Java is 32 bit. Use long for 64 bits.
You are probably overflowing the size of your data type, since the most significant bit is the sign bit. I don't think that Java has unsigned data types, so you may try using a larger data type such as long if you want to hold bigger numbers than int. If you are still overflowing a long though, you're pretty much stuck with BigInteger.

Strange multiplication error

Why would something simple like this:
System.out.println("test problem: " + 194*194*194*409);
output something like this:
test problem: -1308701240
Because you've overflowed an integer. See the limits on the numbers handled by integers in the java documentation.
Rather than give you the direct answer, I'll suggest some steps.
Work out the type of arithmetic you're doing (what type is the literal 194? what type is the result of the multiplication operator you're using?)
What do you expect the answer to be?
Can your result type handle that answer?
What does Java do for results it can't handle?
For bonus points, refer to the Java Language Specification for the last part...
From 15.17.1. Multiplication Operator
If an integer multiplication overflows, then the result is the low-order bits of the
mathematical product as represented in some sufficiently large two's-complement format. As a result, if overflow occurs, then the sign of the result may not be the same as the sign of the mathematical product of the two operand values.
We have 194*194*194*409=2986266056
In binary the result is 1011 0001 1111 1110 1100 1101 1100 1000
The last 32 bits are 1011 0001 1111 1110 1100 1101 1100 1000, so we did not lose any bits by the overflow, but the sign has changed.
Since int is represented in two's complement, (Source), the results is
2^32 - 2986266056 = -1308701240.
Everything works as expected!
Looks like an overflow on the result of the operation. 2.986.266.056 not a valid int value.
Try to make first the mathematical OP and then convert the result to string.

Using a larger prime as a multiplier when overriding hashCode()

I have been reading about hashcode functions for the past couple of hours and have accumulated a couple of questions regarding use of prime numbers as multipliers in custom hashcode implementations. I would be grateful if I could get some insight regarding following questions:
In a comment to #mattb's answer here, #hstoerr advocates for use of larger primes (such as 524287) instead of the common prime 31. My question is, given the following implementation of a hashcode functions for a pair or elements:
#Override
public int hashCode() {
final int prime = 31;
int hash1 = (pg1 == null) ? 0 : pg1.hashCode();
int hash2 = (pg2 == null) ? 0 : pg2.hashCode();
return prime * (hash1 ^ hash2);
}
doesn't this lead to an overflow on the returned int if prime is a large number?
Assuming that the overflow is not a problem (JVM doing an automatic cast) is it better to do a bitshift instead of a cast?
I imagine the performance of the hashcode function vary significantly based on the complexity of the hashcode. Does the size of the prime multiplier not effect the performance?
Is it better/smarter/faster to use multiple primes in a custom hashcode function instead of a single multiplier? If not, is there some other advantage? See the example below from #jinguy's answer to a relevant question:
public int hashCode() {
return a * 13 + b.hashCode() * 23 + (c? 31: 7);
}
where a is an int, b is a String and c is boolean.
How about something like long lhash = prime * (hash1 ^ hash2); then using (int)((lhash >> 32) ^ lhash)? That's something I saw on another question here SO, but it wasn't really explained why it was a good idea to do it like that.
Apologies in advance for the novel. Feel free to make suggestions or edit directly. --Chet
There is an overflow, but not an exception.
The danger doesn't come from losing accuracy, but losing range. Let's use a ridiculous example, where "prime" is a large power of 2, and 8-bit unsigned numbers for brevity. And assume that (hash1 ^ hash2) is 255:
"prime": 1000 0000
(hash1 ^ hash2): 1111 1111
Showing the truncated digits in brackets, our result is:
product: [0111 1111] 1000 0000
But multiplying by 128 is the same as shifting left by 7 places. So we know that whatever the value of (hash1 ^ hash2), the least-significant places of the product will have seven zeros. So if (hash1 ^ hash2) is odd (least significant bit = 1), then the result of multiplying by 128 will always be 128 (after truncating the higher digits). And if (hash1 ^ hash2) is even (LSB is 0, then the product will always be zero.
This extends to larger bit sizes. The general point is that if the lower bits of "prime" are zeros, you're doing a shift (or multiple shift + sum) operation that will give you zeros in the lower bits. And the range of the product of multiplication will suffer.
But let's try making "prime" odd, so that the least significant bit will always be 1. Think about decomposing this into shift / add operations. The unshifted value of (hash1 ^ hash2) will always be one of the summands. The least significant bits that were shifted into guaranteed uselessness by an even "prime" multiplier will now be set based on, at minimum, the bits from the original (hash1 ^ hash2) value.
Now, let's consider a value of prime which is actually prime. If it's more than 2, then we know it's odd. So the lower bits haven't been shifted into uselessness. And by choosing a sufficiently large prime, you get better distribution across the range of output values than you'd get with a smaller prime.
Try some exercises with 16-bit multiplication using 8443 (0010 0000 1111 1011) and 59 (0000 0000 0011 1011). They're both prime, and the lower bits of 59 match the lower bits of 65531. For example, if hash1 and hash2 are both ASCII character values (0 .. 255), then all of the results of (hash1 ^ hash2) * 59 will be <= 15045. This means that roughly 1/4 of the range of hash values (0..65535) for a 16-bit number go unused.
But (hash1 ^ hash2) * 8443 is all over the map. It overflows if (hash1 ^ hash2) is as low as 8. It uses all 16 bits even for very small input numbers. There's much less clustering of hash values across the overall range, even if the input numbers are in a relatively small range.
Assuming that the overflow is not a problem (JVM doing an automatic cast) is it better to do a bitshift instead of a cast?
Most likely not. The JVM should translate into an efficient implementation on the host processor anyway. Integer multiplication should be implemented in hardware. And if not, the JVM is responsible for translating the operation into something reasonable for the CPU. It's very likely that the case of integer multiplication is highly optimized already. If integer multiplication is done more quickly on a given CPU as shift-and-add, the JVM should implement it that way. But it's less likely that the folks writing the JVM would care to watch for cases where multiple shift-and-add operations could have been combined into a single integer multiply.
I imagine the performance of the hashcode function vary significantly based on the complexity of the hashcode. Does the size
of the prime multiplier not effect the performance?
No. The operations are the same when done in hardware regardless of the size, number of bits set, etc. It's probably a couple of clock cycles. It would vary depending on the specific CPU, but should be a constant-time operation regardless of the input values.
Is it better/smarter/faster to use multiple primes in a custom hashcode function instead of a single multiplier? If not, is there
some other advantage?
Only if it reduces the possibility of collisions, and this depends on the numbers you're using. If your hash code depends on A and B and they're in the same range, you might consider using different primes or shifting one of the input values to reduce overlap between the bits. Since you're depending on their individual hash codes, and not their values directly, it's reasonable to assume that their hash codes provide good distribution, etc.
One factor that comes to mind whether you want the hash code for (x, y) to be different from (y, x). If your hash function treats A and B in the same way, then hash(x, y) = hash(y, x). If that's what you want, then by all means use the same multiplier. It not, using a different multiplier would make sense.
How about something like long lhash = prime * (hash1 ^ hash2); then using (int)((lhash >> 32) ^ lhash)? That's something I saw on another question here SO, but it wasn't really explained why it was a good idea to do it like that.
Interesting question. In Java, longs are 64-bit and ints are 32-bit. So this generates a hash using twice as many bits as desired, and then derives the result from the high and low bits combined.
If multiplying a number n by a prime p, and the lowermost k bits of n are all zeros, then the lowermost k bits of the product n * p will also be all zeros. This is fairly easy to see -- if you're multiplying, say, n = 0011 0000 and p = 0011 1011, then the product can be expressed as the sum of two shift operations. Or,
00110000 * p = 00100000 * p + 00010000 * p
= p << 5 + p << 4
Taking p = 59 and using unsigned 8-bit ints and 16-bit longs, here are some examples.
64: 0011 1011 * 0100 0000 = [ 0000 1110 ] 1100 0000 (192)
128: 0011 1011 * 1000 0000 = [ 0001 1101 ] 1000 0000 (128)
192: 0011 1011 * 1100 0000 = [ 0010 1100 ] 0100 0000 (64)
By just dropping the high bits of the result, the range of the resulting hash value is limited when the low bits of the non-prime multiplicand are all zeros. Whether that's an issue in a specific context is, well, context-specific. But for a general hash function it's a good idea to avoid limiting the range of output values even when there are patterns in the input numbers. And in security applications, it's even more critical to avoid anything that would let someone make inferences about the original value based on patterns in the output. Just taking the low bits reveals the exact values of some of the original bits. If we make the assumption that the operation involved multiplying an input number with a large prime, then we know that the original number had as many zeros at the right as the hash output (because the prime's rightmost bit was 1).
By XORing the high bits with the low bits, there's less consistency in the output. And more importantly, it's much harder to make guesses about the input values based on this information. Based on how XOR works, it could mean the original low bit was 0 and the high bit was 1, or the original low bit was 1 and the high bit was 0.
64: 0011 1011 * 0100 0000 = 0000 1110 1100 0000 => 1100 1110 (206)
128: 0011 1011 * 1000 0000 = 0001 1101 1000 0000 => 1001 1101 (157)
192: 0011 1011 * 1100 0000 = 0010 1100 0100 0000 => 0110 1100 (204)
Overflow is not a problem. Hashes are constrained to a narrow value set anyway.
The first hash function you posted isn't very good. Doing return (prime * hash1) ^ hash2;
` instead would reduce the number of collisions in most cases.
Multiplying by a single word int is generally very fast, and the difference between multiplying by different numbers is negligible. Plus the execution time is dwarfed by everything else in the function anyay
Using different prime multipliers for each part may reduce the risk of collisions.

Which (Java) code, library, or algorithm is best at providing similarities between bit patterns?

I am after code, a library routine, or an algorithm that scores how close two different bit or boolean patterns are. Naturally if they are equal then the score should be 1, while if one is all true and the other all false then the score should 0.
Bit pattern example
The bit patterns that i will be testing are many times never actually equal or the same but sometimes they are very similar.
0001 1111 0000
0000 1111 1100
0000 1110 0000
1110 0000 1111
In the above examples 1 & 2 or 1 & 3 are pretty close if i was to score them perhaps the difference would be something like 96 and 95%. On the other hand 1&4 would definitely be a much lower score maybe 25%.
Note that bit patterns may be of different lengths but scoring should still be possible.
001100
000011110000
The above two patterns would be considered identical.
001100
00110000
The above two patterns would be considered close but not identical, because once "scaled" #2 is different from #1.
If the bit patterns are all the same length, just use the exclusive-or (^) operator and count how many zeroes remain.
(xor produces a zero if the two corresponding bits are the same, and a one otherwise).
If they're of different lengths, treat the bit pattern as if it were a string and use something like the Levenshtein distance algorithm.
I've been playing around with fast ways to count the number of matching bits of the bit-wise XOR comparison. Here's what I think is the fastest way:
int num1, num2; // some bit patterns
int diff = num1 ^ num2;
int score;
for (score = 0; diff > 0; diff >>>= 1)
score += diff & 1;
A score of zero means exact match (assuming results of the same length).
public static int bitwiseEditDistance(int a, int b) {
return Integer.bitCount(a ^ b);
}
Integer.bitCount is an obscure little bit of the core libraries.
Returns the number of one-bits in the two's complement binary representation of the specified int value. This function is sometimes referred to as the population count.

How to assign the largest n bit unsigned integer to a BigInteger in Java

I have a scenario where I'm working with large integers (e.g. 160 bit), and am trying to create the biggest possible unsigned integer that can be represented with an n bit number at run time. The exact value of n isn't known until the program has begun executing and read the value from a configuration file. So for example, n might be 160, or 128, or 192, etcetera...
Initially what I was thinking was something like:
BigInteger.valueOf((long)Math.pow(2, n));
but then I realized, the conversion to long that takes place sort of defeats the purpose, given that long is not comprised of enough bits in the first place to store the result. Any suggestions?
On the largest n-bit unsigned number
Let's first take a look at what this number is, mathematically.
In an unsigned binary representation, the largest n-bit number would have all bits set to 1. Let's take a look at some examples:
1(2)= 1 =21 - 1
11(2)= 3 =22 - 1
111(2)= 7 =23 - 1
:
1………1(2)=2n -1
   n
Note that this is analogous in decimal too. The largest 3 digit number is:
103- 1 = 1000 - 1 = 999
Thus, a subproblem of finding the largest n-bit unsigned number is computing 2n.
On computing powers of 2
Modern digital computers can compute powers of two efficiently, due to the following pattern:
20= 1(2)
21= 10(2)
22= 100(2)
23= 1000(2)
:
2n= 10………0(2)
       n
That is, 2n is simply a number having its bit n set to 1, and everything else set to 0 (remember that bits are numbered with zero-based indexing).
Solution
Putting the above together, we get this simple solution using BigInteger for our problem:
final int N = 5;
BigInteger twoToN = BigInteger.ZERO.setBit(N);
BigInteger maxNbits = twoToN.subtract(BigInteger.ONE);
System.out.println(maxNbits); // 31
If we were using long instead, then we can write something like this:
// for 64-bit signed long version, N < 64
System.out.println(
(1L << N) - 1
); // 31
There is no "set bit n" operation defined for long, so traditionally bit shifting is used instead. In fact, a BigInteger analog of this shifting technique is also possible:
System.out.println(
BigInteger.ONE.shiftLeft(N).subtract(BigInteger.ONE)
); // 31
See also
Wikipedia/Binary numeral system
Bit Twiddling Hacks
Additional BigInteger tips
BigInteger does have a pow method to compute non-negative power of any arbitrary number. If you're working in a modular ring, there are also modPow and modInverse.
You can individually setBit, flipBit or just testBit. You can get the overall bitCount, perform bitwise and with another BigInteger, and shiftLeft/shiftRight, etc.
As bonus, you can also compute the gcd or check if the number isProbablePrime.
ALWAYS remember that BigInteger, like String, is immutable. You can't invoke a method on an instance, and expect that instance to be modified. Instead, always assign the result returned by the method to your variables.
Just to clarify you want the largest n bit number (ie, the one will all n-bits set). If so, the following will do that for you:
BigInteger largestNBitInteger = BigInteger.ZERO.setBit(n).subtract(BigInteger.ONE);
Which is mathematically equivalent to 2^n - 1. Your question has how you do 2^n which is actually the smallest n+1 bit number. You can of course do that with:
BigInteger smallestNPlusOneBitInteger = BigInteger.ZERO.setBit(n);
I think there is pow method directly in BigInteger. You can use it for your purpose
The quickest way I can think of doing this is by using the constructor for BigInteger that takes a byte[].
BigInteger(byte[] val) constructs the BigInteger Object from an array of bytes. You are, however, dealing with bits, and so creating a byte[] that might consist of {127, 255, 255, 255, 255} for a 39 bit integer representing 2^40 - 1 might be a little tedious.
You could also use the constructor BigInteger(String val, int radix) - which might be readily more apparently what's going on in your code if you don't mind a performance hit for parsing a String. Then you could generate a string like val = "111111111111111111111111111111111111111" and then call BigInteger myInt = new BigInteger(val, 2); - resulting in the same 39 bit integer.
The first option will require some thinking about how to represent your number. That particular constructor expects a two's-compliment, big-endian representation of the number. The second will likely be marginally slower, but much clearer.
EDIT: Corrected numbers. I thought you meant represent 2^n, and didn't correctly read the largest value n bits could store.

Categories

Resources