Here is what i tried:
public class LongToDoubleTest{
public static void main(String... args) {
System.out.println(Long.MAX_VALUE);
System.out.println(Long.MAX_VALUE/2);
System.out.println(Math.floor(Long.MAX_VALUE/2));
System.out.println(new Double(Math.floor(Long.MAX_VALUE/2)).longValue());
}
}
Here is the output:
9223372036854775807
4611686018427387903
4.6116860184273879E18
4611686018427387904
I was initially trying to figure out, is it possible to keep half of the Long.MAX_VALUE in double without losing data, So I had a test with all those lines, except the last one. So it appeared that i'm right and last 3 was missing. Then, just to clarify it, I added last line. And not 3 appeared but 4. So my question is, from where those 4 appeared and why it's 4 and not 3. Because 4 is actually an incorrect value here.
P.S. I'm very poor in knowledge of IEEE 754, so maybe behaviour I found is absolutely correct, but 4 is obviously a wrong value here.
You need to understand that not every long can be exactly represented as a double - after all, there are 2568 long values, and at most that many double values (although lots of those are reserved for "not a number" values etc). Given that there are also double values which clearly aren't long values (e.g. 0.5 - any non-integer, for a start) that means there can't possibly be a double value for every long value.
That means if you start with a long value that can't be represented, convert it to a double and then back to a long, it's entirely reasonable to get back a different number.
The absolute difference between adjacent double values increases as the magnitude of the numbers gets larger. So when the numbers are very small, the difference between two numbers is really tiny (very very small indeed) - but when the numbers get bigger - e.g. above the range of int - the gap between numbers becomes greater... greater even than 1. So adjacent double values near Long.MAX_VALUE can be quite a distance apart. That means several long values will map to the same nearest double.
The arithmetic here is completely predictable.
The Java double format uses one bit for sign and eleven bits for exponent. That leaves 52 bits to encode the significand (the fraction portion of a floating-point number).
For normal numbers, the significand has a leading 1 bit, followed by a binary point, followed by the 52 bits of the encoding.
When Long.MAX_VALUE/2, 4611686018427387903, is converted to double, it must be rounded to fit in these bits. 4611686018427387903 is 0x3fffffffffffffff. There are 62 significant bits there (two leading zeroes that are insignificant, then 62 bits). Since not all 62 bits fit in the 53 bits available, we must round them. The last nine bits, which we must eliminate by rounding, are 1111111112. We must either round them down to zero (producing 0x3ffffffffffffe00) or up to 10000000002 (which carries into the next higher bit and produces 0x4000000000000000). The latter change (adding 1) is smaller than the former change (subtracting 1111111112). We want a smaller error, so we choose the latter and round up. Thus, we round 0x3fffffffffffffff up to 0x4000000000000000. This is 262, which is 4611686018427387904.
Related
I am working on an android application where for a part of the app, I have 2 floating point values and I cannot have them be exactly the same because this is causing a bug in one of my screens. Those numbers are being sent from a server and are out of my control (e.g. I cannot force them to be different).
The app is written in Kotlin, but I assume that this issue is similar (if not exactly the same) for Java, as Kotlin uses the JVM behind the scenes.
I thought of a "creative" way of solving this without changing my logic too much, by subtracting Float.MIN_VALUE from one of them, making them almost, but not exactly the same. What I actually need to happen is for if(a == b) to fail (where b is actually a - Float.MIN_VALUE).
But to my surprise, when the code runs, if(a == b) returns true. When I opened the "evaluate" window in Android Studio here is what I found out:
Let me reiterate that currentPayment is a Float, so there shouldn't be any auto-conversions or rounding going on (like when dividing Float by an Int for example).
Let me also point out that I can guarantee that currentPayment is not
Float.MAX_VALUE or -Float.MAX_VALUE (so the result of the operation is within the bounds of Float).
According to docs and this answer Float.MIN_VALUE is the smallest positive non-zero value of Float and has a value of 1.4E-45, which is confirmed here:
I also saw in another post (which I cannot find again for some reason), that this can also be thought of as the maximum precision of Float, which makes sense.
Since currentPayment is a Float, I would expect it should be able to hold any floating point value within the bounds of Float to it's maximum precision (i.e. Float.MIN_VALUE).
Therefore I would expect currentPayment to NOT equal currentPayment - Float.MIN_VALUE, but alas that is not the case.
Can anyone explain this please?
Since currentPayment is a Float, I would expect it should be able to hold any floating point value within the bounds of Float to it's maximum precision (i.e. Float.MIN_VALUE).
This is a wrong assumption. Float is called "float" because it has floating precision. The amount of precision depends on how big the number is that you're storing. The smallest possible float value is smaller than the precision of almost any other possible number, so it is too small to affect them if you add or subtract it. At the high end, Float numbers have precisions that are much greater than the integer 1. If you subtract 999,000,000 from Float.MAX_VALUE, it will still return Float.MAX_VALUE because the precision is so poor at the highest end.
Also, since floating point numbers are not stored in base-10, they are inappropriate for storing currency amounts, because you can never exactly represent a decimal fraction. (I mention that because your variable name has the word "payment" in it, which is a red flag.)
You should either use BigDecimal, Long, or Int to represent currency, so your currency amounts and arithmetic will be exact.
Edit:
Here's an analogy to help understand it, since it is hard to contemplate binary numbers. Floats are 32-bits in Java and Kotlin, but imagine we have a special kind of computer that can store a floating point number in base-10. Each bit on this computer is not just 0 or 1, but can be anything from 0 to 9. A Float on this computer can have 4 digits and a decimal place, but the decimal place is floating, so it can be placed anywhere relative to the four digits. So a Float on this computer is always five bits--four of the bits are the digits, and the fifth bit tells you where the decimal place goes.
In this imaginary computer's Float, the smallest possible number that can be represented is .0001 and the largest possible number is 9999.. You can't represent 9999.5 or even 1000.5 because there aren't enough digits available. There's no fixed amount of precision--the precision is determined by where the decimal place is in the current number. Precision is better for numbers with a decimal place farther to the left.
For the number storage format to be able to have a fixed precision, we would have to fix the decimal point in one place for all numbers. We would have to choose a precision. Suppose we chose a precision of 0.001. Our fifth bit that told us where the decimal place goes in the floating point can now just be used for a fifth digit. Now we know the precision is always 0.001, but the largest possible number we can represent is 99.999 and the smallest possible number is 0.001, a much smaller possible range than with floating point. This limitation is the reason floating points are used instead.
What would be the most efficient way to grab the, say, 207th decimal place of a number? Would it just be x * Math.pow(10,207) % 10?
How's this for python
int((x*(10**n)))%10
What you want is impossible.
The only things in java that work with Math.pow and basic operators are the primitives. The only floating point primitives are float and double. These are IEEE754 floating point numbers; doubles are 64 bits and floats are 32 bits.
A simple principle applies: If you have 64 bits, then you can only represent 2^64 different numbers (it's actually a little less). So, you get about 18446744073709551616 numbers, of all numbers in existence, which actually exist as far as the computer is concerned for doubles. all other numbers do not exist.
So what happens if a mathematical operation (say, 0.1 + 0.2) ends up being a number that doesn't exist? Well, java (this is predicated by the IEEE754 standard; most languages and chips do it this way) will return you the nearest number amongst all the 18446744073709551616 numbers that do exist.
The problem with wanting the 207th digit is that obviously, given that only 18446744073709551616 numbers exist, none of those 18446744073709551616 numbers have that kind of precision. Asking for the 207th digit is therefore completely random. It says nothing about the input number... whatsoever.
Let me repeat that: There are no double values that have a significant 207th digit AT ALL.
If you want 'perfect' representation, with no rounding whatsoever, you want BigDecimal, but note that demanding perfection is tricky. Imagine in basic decimal math (computers are binary, but lets stick with decimal as we're all much more familiar with it, what with our 10 fingers and all), I ask you to only give me perfect answers, and then I ask you to divide 1 by 3.
BigDecimal won't let you do that either, so the ops you can run on BigDecimals without telling BigDecimal in what ways it is allowed to be inprecise leads to exceptions.
If you've set it all up exactly how you wanted it, and you really have a BigDecimal with a 207th digit after the comma, you can use the scale feature, or just the power-of-10 feature to get what you want.
Note BigDecimal is not primitive and therefore does not support the +, %, etc operators.
***Special Note: There is no: "This will handle all situations" answer here as the arbitrary value such as 207 could take the calculations way outside the bounds of possible precision of the variable types involved. My answer as such will only work within the bounds of variable type precision for which 207 is really not possible...
To get the specific digit an arbitrary number (like 207) of places after the decimal point... if you just multiply by factor of 10.. and then take mod 10, the answer (in java) is still a floating point type... not a single digit...
To get a specific digit an arbitrary number (n) of places after the decimal point, without converting to string:
Math.floor(x*Math.pow(10,n)) % 10;
to get 4th digit after 2.987654321
x*Math.pow(10, 4) = 29876.54321
Math.floor(29876.54321) = 29876
29876 % 10 = 6
This question already has answers here:
What is the difference between the float and integer data type when the size is the same?
(3 answers)
Closed 3 years ago.
Both int and float types are of 4 bytes in java.
Then how can int represent a range of just -2,147,483,648 to 2,147,483,647 where float has approximately ±3.40282347E+38F while both have the same limited amount of bytes?
According to my understanding both should have the same range as they have the same amount of bytes. Can some one explain me how float can represent a range that large?
"Floating" point means that the number of digits for the fractional part of your number can change to represent your number "as best as possible" given the constraints dictated by its size.
Let's forget for the time being of the 4 bytes of the float datatype and assume that you your "floating point" type can store up to 10 digits plus the negative symbol.
This means you can accurately represent numbers from :-9 999 999 999 to +9 999 999 999.
However, if you want one decimal, you can accurately represent numbers from -999 999 999.9 to +999 999 999.9. As you can see, the range has effectively changed.
Now, let's formalize a bit the explanation by talking about significand and exponent:
the significand contains your significant digits
the exponent represents the exponent of the 10 multiplier or, if it's easier, by how many positions you have to move the decimal point (with 0 being just before the first significant digit).
Let's say that your "floating point" data type can have up to 4 digits in its significand and up to 1 digit in its exponent as well, plus the minus symbol in both significand and exponent.
You will be able to represent numbers from -0.9 999 * 10^9 = -999900000 to +0.999 9 * 10^9 = +999900000. As you can see while the numbers are pretty large you can't accurately represent most large numbers as you only have 4 digits you can use for your representation. This loss in precision is compensated by the ability to represent very small numbers so for example you can represent 0.999 9 * 10^-9 = 0.000 000 000 999 9.
This explains why the range is so large despite the size being only 4 bytes, as stated in your question.
To complete your knowledge on the matter, bring the above concepts to binary (your typical float uses 4 bits for the exponent, 23 for the significand and 1 bit for the sign of the significand).
Wikipedia is a good starting point. The major takeaway from programming purposes, usually, is to understand how many decimal digits you can store given your datatype (your "precision") as that will determine what specific decimal format fits your purposes best.
See the following link for more information:
https://en.wikipedia.org/wiki/Floating-point_arithmetic#IEEE_754:_floating_point_in_modern_computers
Please note that understanding the concept of floating point number on a binary system is extremely important in information technology as even the most simple computations are heavily affected by it.
Floating points as represented on a computer (binary) are for example the reason why writing things like:
public class MyClass {
public static void main(String args[]) {
double x=0.1f;
double y=0.2f;
double z=0.3f;
if(x+y == z) {
System.out.println("something");
}
else {
System.out.println("something else");
}
}
}
will counter-intuitively output something else but if you start playing with the numbers or change the type to float it will produce the correct output.
So be aware: you will need to understand the concept fully.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
How do you explain floating point inaccuracy to fresh programmers and laymen who still think computers are infinitely wise and accurate?
Do you have a favourite example or anecdote which seems to get the idea across much better than an precise, but dry, explanation?
How is this taught in Computer Science classes?
There are basically two major pitfalls people stumble in with floating-point numbers.
The problem of scale. Each FP number has an exponent which determines the overall “scale” of the number so you can represent either really small values or really larges ones, though the number of digits you can devote for that is limited. Adding two numbers of different scale will sometimes result in the smaller one being “eaten” since there is no way to fit it into the larger scale.
PS> $a = 1; $b = 0.0000000000000000000000001
PS> Write-Host a=$a b=$b
a=1 b=1E-25
PS> $a + $b
1
As an analogy for this case you could picture a large swimming pool and a teaspoon of water. Both are of very different sizes, but individually you can easily grasp how much they roughly are. Pouring the teaspoon into the swimming pool, however, will leave you still with roughly a swimming pool full of water.
(If the people learning this have trouble with exponential notation, one can also use the values 1 and 100000000000000000000 or so.)
Then there is the problem of binary vs. decimal representation. A number like 0.1 can't be represented exactly with a limited amount of binary digits. Some languages mask this, though:
PS> "{0:N50}" -f 0.1
0.10000000000000000000000000000000000000000000000000
But you can “amplify” the representation error by repeatedly adding the numbers together:
PS> $sum = 0; for ($i = 0; $i -lt 100; $i++) { $sum += 0.1 }; $sum
9,99999999999998
I can't think of a nice analogy to properly explain this, though. It's basically the same problem why you can represent 1/3 only approximately in decimal because to get the exact value you need to repeat the 3 indefinitely at the end of the decimal fraction.
Similarly, binary fractions are good for representing halves, quarters, eighths, etc. but things like a tenth will yield an infinitely repeating stream of binary digits.
Then there is another problem, though most people don't stumble into that, unless they're doing huge amounts of numerical stuff. But then, those already know about the problem. Since many floating-point numbers are merely approximations of the exact value this means that for a given approximation f of a real number r there can be infinitely many more real numbers r1, r2, ... which map to exactly the same approximation. Those numbers lie in a certain interval. Let's say that rmin is the minimum possible value of r that results in f and rmax the maximum possible value of r for which this holds, then you got an interval [rmin, rmax] where any number in that interval can be your actual number r.
Now, if you perform calculations on that number—adding, subtracting, multiplying, etc.—you lose precision. Every number is just an approximation, therefore you're actually performing calculations with intervals. The result is an interval too and the approximation error only ever gets larger, thereby widening the interval. You may get back a single number from that calculation. But that's merely one number from the interval of possible results, taking into account precision of your original operands and the precision loss due to the calculation.
That sort of thing is called Interval arithmetic and at least for me it was part of our math course at the university.
Show them that the base-10 system suffers from exactly the same problem.
Try to represent 1/3 as a decimal representation in base 10. You won't be able to do it exactly.
So if you write "0.3333", you will have a reasonably exact representation for many use cases.
But if you move that back to a fraction, you will get "3333/10000", which is not the same as "1/3".
Other fractions, such as 1/2 can easily be represented by a finite decimal representation in base-10: "0.5"
Now base-2 and base-10 suffer from essentially the same problem: both have some numbers that they can't represent exactly.
While base-10 has no problem representing 1/10 as "0.1" in base-2 you'd need an infinite representation starting with "0.000110011..".
How's this for an explantation to the layman. One way computers represent numbers is by counting discrete units. These are digital computers. For whole numbers, those without a fractional part, modern digital computers count powers of two: 1, 2, 4, 8. ,,, Place value, binary digits, blah , blah, blah. For fractions, digital computers count inverse powers of two: 1/2, 1/4, 1/8, ... The problem is that many numbers can't be represented by a sum of a finite number of those inverse powers. Using more place values (more bits) will increase the precision of the representation of those 'problem' numbers, but never get it exactly because it only has a limited number of bits. Some numbers can't be represented with an infinite number of bits.
Snooze...
OK, you want to measure the volume of water in a container, and you only have 3 measuring cups: full cup, half cup, and quarter cup. After counting the last full cup, let's say there is one third of a cup remaining. Yet you can't measure that because it doesn't exactly fill any combination of available cups. It doesn't fill the half cup, and the overflow from the quarter cup is too small to fill anything. So you have an error - the difference between 1/3 and 1/4. This error is compounded when you combine it with errors from other measurements.
In python:
>>> 1.0 / 10
0.10000000000000001
Explain how some fractions cannot be represented precisely in binary. Just like some fractions (like 1/3) cannot be represented precisely in base 10.
Another example, in C
printf (" %.20f \n", 3.6);
incredibly gives
3.60000000000000008882
Here is my simple understanding.
Problem:
The value 0.45 cannot be accurately be represented by a float and is rounded up to 0.450000018. Why is that?
Answer:
An int value of 45 is represented by the binary value 101101.
In order to make the value 0.45 it would be accurate if it you could take 45 x 10^-2 (= 45 / 10^2.)
But that’s impossible because you must use the base 2 instead of 10.
So the closest to 10^2 = 100 would be 128 = 2^7. The total number of bits you need is 9 : 6 for the value 45 (101101) + 3 bits for the value 7 (111).
Then the value 45 x 2^-7 = 0.3515625. Now you have a serious inaccuracy problem. 0.3515625 is not nearly close to 0.45.
How do we improve this inaccuracy? Well we could change the value 45 and 7 to something else.
How about 460 x 2^-10 = 0.44921875. You are now using 9 bits for 460 and 4 bits for 10. Then it’s a bit closer but still not that close. However if your initial desired value was 0.44921875 then you would get an exact match with no approximation.
So the formula for your value would be X = A x 2^B. Where A and B are integer values positive or negative.
Obviously the higher the numbers can be the higher would your accuracy become however as you know the number of bits to represent the values A and B are limited. For float you have a total number of 32. Double has 64 and Decimal has 128.
A cute piece of numerical weirdness may be observed if one converts 9999999.4999999999 to a float and back to a double. The result is reported as 10000000, even though that value is obviously closer to 9999999, and even though 9999999.499999999 correctly rounds to 9999999.
I need to store an exact audio position in a database, namely SQLite. I could store the frame position (sample offset / channels) as an integer, but this would cause extra data maintenance in case of certain file conversions.
So I'm thinking about storing the position as an 8 byte real value in seconds, that is a double, and so as a REAL in SQLite. That makes the database structure more consistent.
But, given a maximum samplerate of 192kHz, is the double precision sufficient so that I can always recover the exact frame position when multiplying the value by the samplerate?
Is there a certain maximum position above which an error may occur? What is this maximum position?
PS: this is about SQLite REAL, but also about the C and Java double type which may hold the position value at various stages.
Update:
Since the discussions now focus on the risks related to conversion and rounding, here's the C method that I'm planning to use:
// Given these types:
int samplerate;
long long framepos;
double position;
// First compute the position in seconds from the framepos:
position = (double) framepos / samplerate;
// Now store the position in an SQLite REAL column, and retrieve it later
// Then compute the framepos back from position, with rounding:
framepos = position * samplerate + 0.5;
Is this safe and symmetrical?
A double has 51 bits worth of precision. Depending on the exponent part, some of these bits will represent whole numbers (seconds in your case), the others fractions of seconds.
At 48 kilobits, a minimum of 16 bits is required to get the sub-second precise enough (more if rounding is not optimal). That leaves 35 bits for the seconds, which will span just over a thousand years.
So even if you need an extra bit or two for the sub-second to guard against rounding, and even if SQL loses a bit or two of precision converting it to decimal and back here and there, you aren't anywhere near losing sample precision with your double precision number. Make sure your rounding works correctly - C tends to always round down on convert to integer, so even an infintessimaly small conversion error could throw you off by 1.
I would store it as a (64-bit) integer representing microseconds (approx 2**20). This avoids floating point hardware/software, is readily understood by all, and gives you a range of 0..2**44 seconds which is a little over 55 thousand years.
As an alternative, use a readable fixed precision decimal representation (20 digits should be enough). Right-justified with leading zeros. The cost of conversion is negligible compared to DB accesses anyway.
One advantage of these options is that any database will trivially know how to order them, not necessarily obvious for floating point values.
As the answer by Matthias Wandel explains, there's probably nothing to worry about. OTOH by using integers you would get fixed precision regardless of the magnitude which might be useful.
Say, use a 64-bit integer, and store the time as microseconds. That gives you an equivalent sampling precision of 1 MHz and a range of almost 300000 years (if my quick calculation is correct).
Edit Even when taking into account the need for the timestamp * sample_rate to fit into a 64-bit integer, you still have a range of 1.5 years (2**63/1e6/3600/24/365/192e3), assuming a max sample rate of 192kHz.