Exact conversion from float to int

Exact conversion from float to int - java

I want to convert float value to int value or throw an exception if this conversion is not exact.
I've found the following suggestion: use Math.round to convert and then use == to check whether those values are equal. If they're equal, then conversion is exact, otherwise it is not.
But I've found an example which does not work. Here's code demonstrating this example:
String s = "2147483648";
float f = Float.parseFloat(s);
System.out.printf("f=%f\n", f);
int i = Math.round(f);
System.out.printf("i=%d\n", i);
System.out.printf("(f == i)=%s\n", (f == i));
It outputs:
f=2147483648.000000
i=2147483647
(f == i)=true
I understand that 2147483648 does not fit into integer range, but I'm surprised that == returns true for those values. Is there better way to compare float and int? I guess it's possible to convert both values to strings, but that would be extremely slow for such a primitive function.

floats are rather inexact concepts. They are also mostly pointless unless you're running on at this point rather old hardware, or interacting specifically with systems and/or protocols that work in floats or have 'use a float' hardcoded in their spec. Which may be true, but if it isn't, stop using floats and start using double - unless you have a fairly large float[] there is zero memory and performance difference, floats are just less accurate.
Your algorithm cannot fail when using int vs double - all ints are perfectly representable as double.
Let's first explain your code snippet
The underlying error here is the notion of 'silent casting' and how java took some intentional liberties there.
In computer systems in general, you can only compare like with like. It's easy to put in exact terms of bits and machine code what it means to determine whether a == b is true or false if a and b are of the exact same type. It is not at all clear when a and b are different things. Same thing applies to pretty much any operator; a + b, if both are e.g. an int, is a clear and easily understood operation. But if a is a char and b is, say, a double, that's not clear at all.
Hence, in java, all binary operators that involve different types are illegal. In basis, there is no bytecode to directly compare a float and a double, for example, or to add a string to an int.
However, there is syntax sugar: When you write a == b where a and b are different types, and java determines that one of two types is 'a subset' of the other, then java will simply silently convert the 'smaller' type to the 'larger' type, so that the operation can then succeed. For example:
int x = 5;
long y = 5;
System.out.println(x == y);
This works - because java realizes that converting an int to a long value is not ever going to fail, so it doesn't bother you with explicitly specifying that you intended the code to do this. In JLS terms, this is called a widening conversion. In contrast, any attempt to convert a 'larger' type to a 'smaller' type isn't legal, you have to explicitly cast:
long x = 5;
int y = x; // does not compile
int y = (int) x; // but this does.
The point is simply this: When you write the reverse of the above (int x = 5; long y = x;), the code is identical, it's just that compiler silently injects the (long) cast for you, on the basis that no loss will occur. The same thing happens here:
int x = 5;
long y = 10;
long z = x + y;
That compiles because javac adds some syntax sugar for you, specifically, that is compiled as if it says: long z = ((long) x) + y;. The 'type' of the expression x + y there is long.
Here's the key trick: Java considers converting an int to a float, as well as an int or long to a double - a widening conversion.
As in, javac will just assume it can do that safely without any loss and therefore will not enforce that the programmer explicitly acknowledges by manually adding the cast. However, int->float, as well as long->double are not actually entirely safe.
floats can represent every integral value between -2^23 to +2^23, and doubles can represent every integral value between -2^52 to +2^52 (source). But int can represent every integral value between -2^31 to +2^31-1, and longs -2^63 to +2^63-1. That means at the edges (very large negative/positive numbers), integral values exist that are representable in ints but not in floats, or longs but not in doubles (all ints are representable in double, fortunately; int -> double conversion is entirely safe). But java doesn't 'acknowledge' this, which means silent widening conversions can nevertheless toss out data (introduce rounding) silently.
That is what happens here: (f == i) is syntax sugared into (f == ((float) i)) and the conversion from int to float introduces the rounding.
The solution
Mostly, when using doubles and floats and nevertheless wishing for exact numbers, you've already messed up. These concepts fundamentally just aren't exact and this exactness cannot be sideloaded in by attempting to account for error bands, as the errors introduced due to the rounding behaviour of float and double cannot be tracked (not easily, at any rate). You should not be using float/double as a consequence. Either find an atomary unit and represent those in terms of int/long, or use BigDecimal. (example: To write bookkeeping software, do not store finance amounts as a double. do store them as 'cents' (or satoshis or yen or pennies or whatever the atomic unit is in that currency) in long, or, use BigDecimal if you really know what you are doing).
I want an answer anyway
If you're absolutely positive that using float (or even double) here is acceptable and you still want exactness, we have a few solutions.
Option 1 is to employ the power of BigDecimal:
new BigDecimal(someDouble).intValueExact()
This works, is 100% reliable (unless float to double conversion can knock a non-exact value into an exact one somehow, I don't think that can happen), and throws. It's also very slow.
An alternative is to employ our knowledge of how the IEEE floating point standard works.
A real simple answer is simply to run your algorithm as you wrote it, but to add an additional check: If the value your int gets is below -2^23 or above +2^23 then it probably isn't correct. However, there are still a smattering of numbers below -2^23 and +2^23 that are perfectly representable in both float and int, just, no longer every number at that point. If you want an algorithm that will accept those exact numbers as well, then it gets much more complicated. My advice is not to delve into that cesspool: If you have a process where you end up with a float that is anywhere near such extremes, and you want to turn them to int but only if that is possible without loss, you've arrived at a crazy question and you need to rewire the parts that you got you there instead!
If you really need that, instead of trying to numbercrunch the float, I suggest using the BigDecimal().intValueExact() trick if you truly must have this.

Related

Why is Java's division broken?

I am an experienced php developer just starting to learn Java. I am following some Lynda courses at the moment and I'm still really early stages. I'm writing sample programs that ask for user input and do simple calculation and stuff.
Yesterday I came across this situation:
double result = 1 / 2;
With my caveman brain I would think result == 0.5, but no, not in Java. Apparantly 1 / 2 == 0.0. Yes, I know that if I change one of the operands to a double the result would also be a double.
This scares me actually. I can't help but think that this is very broken. It is very naive to think that an integer division results in an integer. I think it is even rarely the case.
But, as Java is very widely used and searching for 'why is java's division broken?' doesn't yield any results, I am probably wrong.
My questions are:
Why does division behave like this?
Where else can I expect to find such magic/voodoo/unexpected behaviour?

Java is a strongly typed language so you should be aware of the types of the values in expressions. If not...
1 is an int (as 2), so 1/2 is the integer division of 1 and 2, so the result is 0 as an int. Then the result is converted to a corresponding double value, so 0.0.
Integer division is different than float division, as in math (natural numbers division is different than real numbers division).

You are thinking like a PHP developer; PHP is dynamically typed language. This means that types are deduced at run-time, so a fraction cannot logically produce a whole number, thus a double (or float) is implied from the division operation.
Java, C, C++, C# and many other languages are strongly typed languages, so when an integer is divided by an integer you get an integer back, 100/50 gives me back 2, just like 100/45 gives me 2, because 100/45 is actually 2.2222..., truncate the decimal to get a whole number (integer division) and you get 2.
In a strongly typed language, if you want a result to be what you expect, you need to be explicit (or implicit), which is why having one of your parameters in your division operation be a double or float will result in floating point division (which gives back fractions).
So in Java, you could do one of the following to get a fractional number:
double result = 1.0 / 2;
double result = 1f / 2;
double result = (float)1 / 2;
Going from a loosely typed, dynamic language to a strongly typed, static language can be jarring, but there's no need to be scared. Just understand that you have to take extra care with validation beyond input, you also have to validate types.
Going from PHP to Java, you should know you can not do something like this:
$result = "2.0";
$result = "1.0" / $result;
echo $result * 3;
In PHP, this would produce the output 1.5 (since (1/2)*3 == 1.5), but in Java,
String result = "2.0";
result = "1.0" / result;
System.out.println(result * 1.5);
This will result in an error because you cannot divide a string (it's not a number).
Hope that can help.

I'm by no means a professional on this, but I think it's because of how the operators are defined to do integer arithmetic. Java uses integer division in order to compute the result because it sees that both are integers. It takes as inputs to this "division" method two ints, and the division operator is overloaded, and performs this integer division. If this were not the case, then Java would have to perform a cast in the overloaded method to a double each time, which is in essence useless if you can perform the cast prior anyways.

If you try it with c++, you will see the result is the same.
The reason is that before assigning the value to the variable, you should calculate it. The numbers you typed (1 and 2) are integers, so their memory allocation should be as integers. Then, the division should done according to integers. After that it will cast it to double, which gives 0.0.

Why does division behave like this?
Because the language specification defines it that way.
Where else can I expect to find such magic/voodoo/unexpected behaviour?
Since you're basically calling "magic/voodoo" something which is perfectly defined in the language specification, the answer is "everywhere".
So the question is actually why there was this design decision in Java. From my point of view, int division resulting in int is a perfectly sound design decision for a strongly typed language. Pure int arithmetic is used very often, so would an int division result in float or double, you'd need a lot of rounding which would not be good.

package demo;
public class ChocolatesPurchased
{
public static void main(String args[])
{
float p = 3;
float cost = 2.5f;
p *= cost;
System.out.println(p);
}
}

Int can be incremented by a double value

This code seems to work in Java, violating everything I thought I knew about the language:
int x = 0;
x += 7.4;
x now has the value 7. Of course, one can't just write int x = 7.4, so this behavior seems strange and inconsistent to me.
Why did the developers of Java choose such a behavior?
The question that mine was marked as a duplicate of was actually answering the "what happens" part, but not my main question: what the rationale is.

The operators for numbers do all kinds of casting which in this case converts the 7.4 double to a 7 int by rounding it.
What you have here is a Compound Assignment Operators
So what really gets executed is
x= (int)(x + 7.4)
Since x is an int and 7.4 x gets converted to double vs a Binary Numeric Promotion so you get 7.4 as an intermediate result.
The result (a double) is then cast and therefore subject to a Narrowing Primitive Conversion which rounds it to 7
Regarding the new question: Why was it done this way?
Well you can argue long if implicit conversions are a good or bad thing. Java went some kind of middle road with some conversions between primitives, their boxed types and Strings.
The += operator then has a rather simple and straight forward semantics. It really only looks strange if you consider it an increment by operator, instead of what it really is: a shorthand for a combination of operator and assignment.

Sometime back only i read about it
It will be actually
X= (int)(x + 7.4)

No. it's not inconsistent. It round-to-nearest mode.
https://docs.oracle.com/javase/specs/jls/se7/html/jls-5.html
A widening conversion of an int or a long value to float, or of a long value to double, may result in loss of precision - that is, the result may lose some of the least significant bits of the value. In this case, the resulting floating-point value will be a correctly rounded version of the integer value, using IEEE 754 round-to-nearest mode (§4.2.4).

In my opinion x = 7 due to basic conversion

Why is math.pow not natively able to deal with ints? (floor/ceil, too)

I know that in Java (and probably other languages), Math.pow is defined on doubles and returns a double. I'm wondering why on earth the folks who wrote Java didn't also write an int-returning pow(int, int) method, which seems to this mathematician-turned-novice-programmer like a forehead-slapping (though obviously easily fixable) omission. I can't help but think that there's some behind-the-scenes reason based on the intricacies of CS that I just don't know, because otherwise... huh?
On a similar topic, ceil and floor by definition return integers, so how come they don't return ints?
Thanks to all for helping me understand this. It's totally minor, but has been bugging me for years.

java.lang.Math is just a port of what the C math library does.
For C, I think it comes down to the fact that CPU have special instructions to do Math.pow for floating point numbers (but not for integers).
Of course, the language could still add an int implementation. BigInteger has one, in fact. It makes sense there, too, because pow tends to result in rather big numbers.
ceil and floor by definition return integers, so how come they don't return ints
Floating point numbers can represent integers outside of the range of int. So if you take a double argument that is too big to fit into an int, there is no good way for floor to deal with it.

From a mathematical perspective, you're going to overflow your integer if it's larger than 231-1, and overflow your long if it's larger than 264-1. It doesn't take much to overflow it, either.
Doubles are nice in that they can represent numbers from ~10-308 to ~10308 with 53 bits of precision. There may be some fringe conversion issues (such as the next full integer in a double may not exactly be representable), but by and large you're going to get a much larger range of numbers than you would if you strictly dealt with integers or longs.
On a similar topic, ceil and floor by definition return integers, so how come they don't return ints?
For the same reason outlined above - overflow. If I have an integral value that's larger than what I can represent in a long, I'd have to use something that could represent it. A similar thing occurs when I have an integral value that's smaller than what I can represent in a long.

Optimal implementation of integer pow() and floating-point pow() are very different. And C's math library was probably developed around the time when floating-point coprocessors were a consideration. Optimal implementation of floating point operation is to shift the numbers closer to 1 (to force quicker conversion of the power series) and then shift the result back. For integer power, a more accurate result can be had in O(log(p)) time by doing something like this:
// p is a positive integer power set somewhere above, n is the number to raise to power p
int result = 1;
while( p != 0){
if (p & 1){
result *= n;
}
n = n*n;
p = p >> 1;
}

Because all ints can be upcast to a double without loss and the pow function on a double is no less efficient that that on an int.

The reason lies behind the implementation of Math.pow() (JNI of default implementation). The CPU has an exponentiation module which works with doubles as input and output. Why should Java convert that for you when you have much better control over this yourself?
For floor and ceil the reasons are the same, but note that:
(int) Math.floor(d) == (int) d; // d > 0
(int) Math.ceil(d) == -(int)(-d); // d < 0
For most cases (no warranty around or beyond Integer.MAX_VALUE or Integer.MIN_VALUE).
Java leaves you with
(int) Math.pow(a,b)
because the result of Math.pow may even be NaN or Infinity depending on input.

Best way to avoid number polarity reversal if multiplying two big numbers in Java

My question is related to this
How can I check if multiplying two numbers in Java will cause an overflow?
In my application, x and y are calculated on the fly and somewhere in my formula I have to multiply x and y.
int x=64371;
int y=64635;
System.out.println((x*y));
I get wrong output as -134347711
I can quickly fix above by the changing variable x and y from type int to long and get correct answer for above case. But, there is no gurantee that x and y won't grow beyond max capacity of long as well.
Question
Why does I get a negative number here, even though I am not storing the final result in any variable? (for curiosity sake)
Since, I won't know the value of x and y in advance, is there any quicker way to avoid this overflow. Maybe by dividing all x and y by a certain large constant for entire run of the application or should I take log of x and y before multiplying them? (actual question)
EDIT:
Clarification
The application runs on a big data set, which takes hours to complete. It would be nicer to have a solution which is not too slow.
Since the final result is used for comparison (they just need to be somewhat proportional to the original result) and it is acceptable to have +-5% error in the final value if that gives huge performance gain.

If you know that the numbers are likely to be large, use BigInteger instead. This is guaranteed not to overflow, and then you can either check whether the result is too large to fit into an int or long, or you can just use the BigInteger value directly.
BigInteger is an arbitrary-precision class, so it's going to be slower than using a direct primitive value (which can probably be stored in a processor register), so figure out whether you're realistically going to be overflowing a long (an int times an int will always fit in a long), and choose BigInteger if your domain really requires it.

You get a negative number because of an integer overflow: using two-s complement representation, Java interprets any integer with the most significant bit set to 1 as a negative.
There are very clever methods involving bit manipulation for detecting situations when an addition or subtraction would result in an overflow or an underflow. If you do not know how big your results are going to be, it is best to switch to BigInteger. Your code would look very different, though, because Java lacks operator overloading facilities that would make mathematical operations on BigInteger objects look familiar. The code will be somewhat slower, too. However, you will be guaranteed against overflows and underflows.
EDIT :
it is acceptable to have +-5% error in the final value if that gives huge performance gain.
+-5% error is a huge allowance for error! If this is indeed acceptable in your system, than using double or even float could work. These types are imprecise, but their range is enormously larger than that of an int, and they do not overflow so easily. You have to be extremely careful, though, because floating-point data types are inherently inexact. You need to always keep in mind the way the data is represented to avoid common precision problems.

Why does I get a negative number here, even though I am not storing
the final result in any variable? (for curiosity sake)
x and y are int types. When you multiply them, they are put into a piece of memory temporarily. The type of that is determined by the types of the originals. int*int will always yield an int. Even if it overflows. if you cast one of them to a long, then it will create a long for the multiplication, and you will not get an overflow.
Since, I won't know the value of x and y, is there any quicker way to
avoid this overflow. Maybe by dividing all x and y by a certain large
constant for entire run of the application or should I take log of x
and y before multiplying them? (actual question)
If x and y are positive then you can check
if(x*y<0)
{
//overflow
}
else
{
//do something with x*y
}
Unfortunately this is not fool-proof. You may overrun right into positive numbers again. for example: System.out.println(Integer.MAX_VALUE * 3); will output: 2147483645.
However, this technique will always work for adding 2 integers.
As others have said, BigInteger is sure not to overflow.

Negative value is just (64371 * 64635) - 2^32. Java not performs widening primitive conversion at run time.

Multiplication of ints always result in an int, even if it's not stored in a variable. Your product is 4160619585, which requires unsigned 32-bit (which Java does not have), or a larger word size (or BigInteger, as someone seem to have mentioned already).
You could add logs instead, but the moment you try to exp the result, you would get a number that won't round correctly into a signed 32-bit.

Since both multiplicands are int, doing the multiplication using long via casting would avoid an overflow in your specific case:
System.out.println(x * (long) y);
You don't want to use logarithms because transcendental functions are slow and floating point arithmetic is imprecise - the result is likely to not be equal to the correct integer answer.

Can every float be expressed exactly as a double?

Can every possible value of a float variable can be represented exactly in a double variable?
In other words, for all possible values X will the following be successful:
float f1 = X;
double d = f1;
float f2 = (float)d;
if(f1 == f2)
System.out.println("Success!");
else
System.out.println("Failure!");
My suspicion is that there is no exception, or if there is it is only for an edge case (like +/- infinity or NaN).
Edit: Original wording of question was confusing (stated two ways, one which would be answered "no" the other would be answered "yes" for the same answer). I've reworded it so that it matches the question title.

Yes.
Proof by enumeration of all possible cases:
public class TestDoubleFloat {
public static void main(String[] args) {
for (long i = Integer.MIN_VALUE; i <= Integer.MAX_VALUE; i++) {
float f1 = Float.intBitsToFloat((int) i);
double d = (double) f1;
float f2 = (float) d;
if (f1 != f2) {
if (Float.isNaN(f1) && Float.isNaN(f2)) {
continue; // ok, NaN
}
fail("oops: " + f1 + " != " + f2);
}
}
}
}
finishes in 12 seconds on my machine. 32 bits are small.

In theory, there is not such a value, so "yes", every float should be representable as a double.. Converting from a float to a double should involve just tacking four bytes of 00 on the end -- they are stored using the same format, just with different sized fields.

Yes, floats are a subset of doubles. Both floats and doubles have the form (sign * a * 2^b). The difference between floats and doubles is the number of bits in a & b. Since doubles have more bits available, assigning a float value to a double effectively means inserting extra 0 bits.

As everyone has already said, "no". But that's actually a "yes" to the question itself, i.e. every float can be exactly expressed as a double. Confusing. :)

If I'm reading the language specification correctly (and as everyone else is confirming), there is no such value.
That is, each claims only to hold only IEEE 754 standard values, so casts between the two should incur no change except in memory given.
(clarification: There would be no change as long as the value was small enough to be held in a float; obviously if the value was too many bits to be held in a float to begin with, casting from double to float would result in a loss of precision.)

#KenG: This code:
float a = 0.1F
println "a=${a}"
double d = a
println "d=${d}"
fails not because 0.1f can't be exactly represented. The question was "is there a float value that cannot be represented as a double", which this code doesn't prove. Although 0.1f can't be stored exactly, the value that a is given (which isn't 0.1f exactly) can be stored as a double (which also won't be 0.1f exactly). Assuming an Intel FPU, the bit pattern for a is:
0 01111011 10011001100110011001101
and the bit pattern for d is:
0 01111111011 100110011001100110011010 (followed by lots more zeros)
which has the same sign, exponent (-4 in both cases) and the same fractional part (separated by spaces above). The difference in the output is due to the position of the second non-zero digit in the number (the first is the 1 after the point) which can only be represented with a double. The code that outputs the string format stores intermediate values in memory and is specific to floats and doubles (i.e. there is a function double-to-string and another float-to-string). If the to-string function was optimised to use the FPU stack to store the intermediate results of the to-string process, the output would be the same for float and double since the FPU uses the same, larger format (80bits) for both float and double.
There are no float values that can't be stored identically in a double, i.e. the set of float values is a sub-set of the the set of double values.

Snark: NaNs will compare differently after (or indeed before) conversion.
This does not, however, invalidate the answers already given.

I took the code you listed and decided to try it in C++ since I thought it might execute a little faster and it is significantly easier to do unsafe casting. :-D
I found out that for valid numbers, the conversion works and you get the exact bitwise representation after the cast. However, for non-numbers, e.g. 1.#QNAN0, etc., the result will use a simplified representation of the non-number rather than the exact bits of the source. For example:
**** FAILURE **** 2140188725 | 1.#QNAN0 -- 0xa0000000 0x7ffa1606
I cast an unsigned int to float then to double and back to float. The number 2140188725 (0x7F90B035) results in a NAN and converting to double and back is still a NAN but not the exact same NAN.
Here is the simple C++ code:
typedef unsigned int uint;
for (uint i = 0; i < 0xFFFFFFFF; ++i)
{
float f1 = *(float *)&i;
double d = f1;
float f2 = (float)d;
if(f1 != f2)
printf("**** FAILURE **** %u | %f -- 0x%08x 0x%08x\n", i, f1, f1, f2);
if ((i % 1000000) == 0)
printf("Iteration: %d\n", i);
}

The answer to the first question is yes, the answer to the 'in other words', however is no. If you change the test in the code to be if (!(f1 != f2)) the answer to the second question becomes yes -- it will print 'Success' for all float values.

In theory every normal single can have the exponent and mantissa padded to create a double and then remove the padding and you return to the original single.
When you go from theory to reality is when you will have problems. I dont know if you were interested in theory or implementation. If it is implementation then you can rapidly get into trouble.
IEEE is a horrible format, my understanding it was intentionally designed to be so tough that nobody could meet it and allow the market to catch up to intel (this was a while back) allowing for more competition. If that is true it failed, either way we are stuck with this dreadful spec. Something like the TI format is far superior for the real world in so many ways. I have no connection to either company or any of these formats.
Thanks to this spec there are very few if any fpus that actually meet it (in hardware or even in hardware plus the operating system), and those that do often fail on the next generation. (google: TestFloat). The problems these days tend to lie in the int to float and float to int and not single to double and double to single as you have specified above. Of course what operation is the fpu going to perform to do that conversion? Add 0? Multiply by 1? Depends on the fpu and the compiler.
The problem with IEEE related to your question above is that there is more than one way a number, not every number but many numbers can be represented. If I wanted to break your code I would start with minus zero in the hope that one of the two operations would convert it to a plus zero. Then I would try denormals. And it should fail with a signaling nan, but you called that out as a known exception.
The problem is that equal sign, here is rule number one about floating point, never use an equal sign. Equals is a bit comparison not a value comparison, if you have two values represented in different ways (plus zero and minus zero for example) the bit comparison will fail even though its the same number. Greater than and less than are done in the fpu, equals is done with the integer alu.
I realize that you probably used the equal to explain the problem and not necessarily the code you wanted to succeed or fail.

If a floating-point type is viewed as representing a precise value, then as other posters have noted, every float value is representable as a double, but only a few values of double can be represented by float. On the other hand, if one recognizes that floating-point values are approximations, one will realize the real situation is reversed. If one uses a very precise instrument to measure something which is 3.437mm, one may correctly describe is size as 3.4mm. if one uses a ruler to measure the object as 3.4mm, it would be incorrect to describe its size as 3.400mm.
Even bigger problems exist at the top of the range. There is a float value that represents: "computed value exceeded 2^127 by an unknown amount", but there's no double value that indicates such a thing. Casting an "infinity" from single to double will yield a value "computed value exceeded 2^1023 by an unknown amount" which is off by a factor of over a googol.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.