How to change the precision of a float conversion?

How to change the precision of a float conversion? - java

I am trying to convert float value to 32 bit unsigned long value and facing the problem of loss of value.
long v = (long) f;
Here when f is 4294967295 ((2^32) -1). The conversion to long returns 4294967296 instead of 4294967295 because float conversion is precised to 7 decimal places. I need precision to 9 decimal places. Is there any way to achieve this?

Quote from Java Puzzlers: Traps, Pitfalls, and Corner Cases book:
Floating-point operations return the floating-point value that is closest to their
exact mathematical result. Once the distance between adjacent floating-point values
is greater than 2, adding 1 to a floating-point value will have no effect,
because the half-way point between values won’t be reached. For the float type,
the least magnitude beyond which adding 1 will have no effect is 2^25, or
33,554,432; for the double type, it is 2^54, or approximately 1.8 × 10^16.
So basicly, if you want to represent big numbers, float is a bad idea. Above 2^25 it is not able to represent at least every other integer. It get worse the bigger the number gets.
The best option for you would be to use BigDecimal instead.

Related

Precision loss with java.lang.Double

Say I have 2 double values. One of them is very large and one of them is very small.
double x = 99....9; // I don't know the possible max and min values,
double y = 0,00..1; // so just assume these values are near max and min.
If I add those values together, do I lose precision?
In other words, does the max possible double value increase if I assign an int value to it? And does the min possible double value decrease if I choose a small integer part?
double z = x + y; // Real result is something like 999999999999999.00000000000001

double values are not evenly distributed over all numbers. double uses the floating point representation of the number which means you have a fixed amount of bits used for the exponent and a fixed amount of bits used to represent the actual "numbers"/mantissa.
So in your example using a large and a small value would result in dropping the smaller value since it can not be expressed using the larger exponent.
The solution to not dropping precision is using a number format that has a potentially growing precision like BigDecimal - which is not limited to a fixed number of bits.

I'm using a decimal floating point arithmetic with a precision of three decimal digits and (roughly) with the same features as the typical binary floating point arithmetic. Say you have 123.0 and 4.56. These numbers are represented by a mantissa (0<=m<1) and an exponent: 0.123*10^3 and 0.456*10^1, which I'll write as <.123e3> and <.456e1>. Adding two such numbers isn't immediately possible unless the exponents are equal, and that's why the addition proceeds according to:
<.123e3> <.123e3>
<.456e1> <.004e3>
--------
<.127e3>
You see that the necessary alignment of the decimal digits according to a common exponent produces a loss of precision. In the extreme case, the entire addend could be shifted into nothingness. (Think of summing an infinite series where the terms get smaller and smaller but would still contribute considerably to the sum being computed.)
Other sources of imprecision result from differences between binary and decimal fractions, where an exact fraction in one base cannot be represented without error using the other one.
So, in short, addition and subtraction between numbers from rather different orders of magnitude are bound to cause a loss of precision.

If you try to assign too big value or too small value a double, compiler will give an error:
try this
double d1 = 1e-1000;
double d2 = 1e+1000;

Determine smallest floating point type that can hold a string value

I'm working on a method that translates a string into an appropriate Number type, depending upon the format of the number. If the number appears to be a floating point value, then I need to return the smallest type I can use without sacrificing precision (Float, Double or BigDecimal).
Based on How many significant digits have floats and doubles in java? (and other resources), I've learned than Float values have 23 bits for the mantissa. Based on this, I used the following method to return the bit length for a given value:
private static int getBitLengthOfSignificand(String integerPart,
String fractionalPart) {
return new BigInteger(integerPart + fractionalPart).bitLength();
}
If the result of this test is below 24, I return a Float. If below 53 I return a Double, otherwise a BigDecimal.
However, I'm confused by the result when I consider Float.MAX_VALUE, which is 3.4028235E38. The bit length of the significand is 26 according to my method (where integerPart = 3 and fractionalPart = 4028235. This triggers my method to return a Double, when clearly Float would suffice.
Can someone highlight the flaw in my thinking or implementation? Another idea I had was to convert the string to a BigDecimal and scale down using floatValue() and doubleValue(), testing for overflow (which is represented by infinite values). But that loses precision, so isn't appropriate for me.

The significand is stored in binary, and you can think of it as a number in its decimal representation only if you don't let it confuse you.
The exponent is a binary exponent that does not represent a multiplication by a power of ten but by a power of two. For this reason, the E38 in the number you used as example is only a convenience: the real significand is in binary and should be multiplied by a power of two to obtain the actual number. Powers of two and powers of ten aren't the same, so “3.4028235” is not the real significand.
The real significand of Float.MAX_VALUE is in hexadecimal notation, 0x1.fffffe, and its associated exponent is 127, meaning that Float.MAX_VALUE is actually 0x1.fffffe * 2127.
Looking at the decimal representation to choose a binary floating-point type to put the value in, as you are trying to do, doesn't work. For one thing, the number of decimal digits that one is sure to recover from a float is different from the number of decimal digits one may need to write to distinguish a float from its neighbors (6 and 9 respectively). You chose to write “3.4028235E38” but you could have written 3.40282E38, which for your algorithm, looks easier to represent, when it isn't, really. When people write that “3.4028235E38” is the largest finite value of the float type, they mean that if you round this decimal number to float, you will arrive to the largest float. If you parse “3.4028235E38” as a double-precision number it won't even be equal to Float.MAX_VALUE.
To put it differently: another way to write Float.MAX_VALUE is 3.4028234663852885981170418348451692544E38. It is still representable as a float (it represents the exact same value as 3.4028235E38). It looks like it has many digits because these are decimal digits that appear for a decimal exponent, when in fact the number is represented internally with a binary exponent.
(By the way, your approach does not check that the exponent is in range to represent a number in the chosen type, which is another condition for a type to be able to represent the number from a string.)

I would work in terms of the difference between the actual value and the nearest float. BigDecimal can store any finite length decimal fraction exactly and do arithmetic on it:
Convert the String to the nearest float x. If x is infinite, but the value has a finite double representation use that.
Convert the String exactly to BigDecimal y.
If y is zero, use float, which can represent zero exactly.
If not, convert the float x to BigDecimal, z.
Calculate, in BigDecimal to a reasonable number of decimal places, the absolute value of (y-z)/z. That is the relative rounding error due to using float. If it is small enough for your purposes, less than some value you pick, use float. If not, use double.
If you literally want no sacrifice in precision, it is much simpler. Convert to both float and double. Compare them for equality. The comparison will be done in double. If they compare equal, go with the float. If not, go with the double.

How does java round an integer when stored in a floating point

Its a classic problem: your legacy code uses a floating point when it should really be using n integer. But, its to expensive to change every instance of that variable (or several) in the code. So, you need to write your own rounding function that takes a bunch of parameters to improve accuracy and convert to an integer.
So, the basic questions is, how do floating point numbers round when they are made in java? the classic example is 0.1 what is often quoted as rounding to 0.0999999999998 (or something like that). But does a floating point number always round down to the next value it can represent when given an integer in Java? Does it round down it's internal mantissa to efficiently round down its absolute value? Or does it just pick the value with the smallest error between the integer and the new float?
Also is the behavior different when calling Float.parseFloat(String) when the String is an integer like "1234567890"? And is the behavior also the same when String is a Floating point with more precision than the Float can store.
Please note that I use floating point or reference Float, I use that interchangeably with Double. Same with integer and long.

how do floating point numbers round when they are made in java?
Java truncates (rounds towards zero) when you use the construct (int) d where d has type double or float. If you need to round to the nearest integer, you can use the line below:
int a = (int) Math.round(d);
the classic example is 0.1 what is often quoted as rounding to 0.0999999999998 (or something like that).
The issue you allude to does not exist with integers, which are exactly representable as double (for those between -253 and 253). If the number you are rounding comes from previous computations that should have produced an integer but may not have because of floating-point rounding errors, then (int) Math.round(d) is likely the solution you should use. It means that you will get the correct integer as long as the cumulative error is not above 0.5.
your legacy code uses a floating point when it should really be using n integer. But, its to expensive to change every instance of that variable (or several) in the code.
If the computations producing the double d are only computations +, -, * with other integers, producing intermediate results between -253 and 253, then d automatically contains an integer (it is exact because the floating-point computations involved are exact, and it is an integer because the exact result is an integer), and you can convert it with the simpler (int) d. On the other hand, if division or non-integer operands are involved, then you should not lightly change the type of d, because it would change the results of these computations.
Also is the behavior different when calling Float.parseFloat(String) when the String is an integer like "1234567890"?
This will produce a float whose value is the nearest representable single-precision value to the rational 1234567890. This happens to be 1234567936.0f.
And is the behavior also the same when String is a Floating point with more precision than the Float can store.
Technically, “0.1” is more precision than Float can store. Also, technically, the previous example 1234567890 is also more precision than Float can store. The behavior is the same: Float.parseFloat("0.1") produces the nearest float to the rational number 0.1.

Why when I assign an int to a float is it not the same value?

When I assign from an int to a float I thought float allows more precision, so would not lose or change the value assigned, but what I am seeing is something quite different. What is going on here?
for(int i = 63000000; i < 63005515; i++) {
int a = i;
float f = 0;
f=a;
System.out.print(java.text.NumberFormat.getInstance().format(a) + " : " );
System.out.println(java.text.NumberFormat.getInstance().format(f));
}
some of the output :
...
63,005,504 : 63,005,504
63,005,505 : 63,005,504
63,005,506 : 63,005,504
63,005,507 : 63,005,508
63,005,508 : 63,005,508
Thanks!

A float has the same number of bits as an int -- 32 bits. But it allows for a greater range of values, far greater than the range of int values. However, the precision is fixed at 24 bits (23 "mantissa" bits, plus 1 implied 1 bit). At the value of about 63,000,000, the precision is greater than 1. You can verify this with the Math.ulp method, which gives the difference between 2 consecutive values.
The code
System.out.println(Math.ulp(63000000.0f));
prints
4.0
You can use double values for a far greater (yet still limited) precision:
System.out.println(Math.ulp(63000000.0));
prints
7.450580596923828E-9
However, you can just use ints here, because your values, at about 63 million, are still well below the maximum possible int value, which is about 2 billion.

A float in java is a number IEEE 754 floating point representation, even when it can be used to represent values from ±1.40129846432481707e-45 to ±3.40282346638528860e+38 it has only 6 or 7 significant decimal digits.
A simple solution would be use a double which has at least 14 significant digits and can cover without any issue all the values of an int.
However, if it is accuracy what you're looking for stay away from native floating point representations and go for classes like BigInteger and BigDecimal.

No, they are not necessarily the same value. An int and a float are each 32 bits but in a float some of those bits are used for the floating point part of the number so there are fewer whole numbers which can be represented in a float than in an int. Depending on what your application is doing with these numbers you may not care about these differences or maybe you want to look at using something like BigDecimal.

Floats don't allow more precision, floats allow wider range of numbers.
We've got 2^32 possible values for integers in range (approximately) -2 * 10^9 to 2 * 10^9. Floats are also 32bit, so the number of possible values is at most the same as for integers.
Out of these 32 bits, some of them are reserved for mantisa, the rest of these is for exponent. The resulting number represented by the float is then calculated (for simplicity I'll use 10-base) as mantisa * 10^exponent.
Obviously, the maximum precision is limited by the number of bits assigned to mantisa. So you can represent some integers exactly as integers, but they won't fit to mantisa, so the least significant bits are thrown off, as in your case.

Float have a greater range of values but lower precision.
Int have a lower range of values but higher precision.
Int is specific to 1, while Float is specific to 4.
So if you are dealing with trillions but don't care about +/- 4 then use float. but if you need the last digit to be precise you need to use int.

Convert string to float without round off java

I want to convert longitude and latitude that I get as a string from my database. The string is correct, and when i try to convert it into double, it is also correct. However when i am convert the double or the string value (i have tried both) into a float value, the last decimal gets round off.
The value of the string or double is 59.858139
The convertion to float is 59.85814
I've tried everything, and this is one desperate example :)
private float ConvertToFloat(double d)
{
float f = 00.000000f;
f = (float) d;
return f;
}

You are aware that doubles have more precision than floats and that floats round off, right? This is expected behaviour. There is no sense in casting a double to a float in this case.
Here's something to get you thinking in the right direction...
Double.doubleToRawLongBits(long value);
Float.intBitsToFloat(int bits);
Doubles can't fit into int and they have to fit into long. It's really twice the size, even mediating bits with strings won't do any good here.

1. float has only 24 bits of precision, which will be insufficient to hold the number of digits in your latitude and longitude.
2. The rounding off is due to the size of the number. So use double if you require floating point, or use BigDecimal

We are starting with your decimal number 59.858139
Convert that number to binary: 111011.11011011101011101111111101011100011011000001000110100001000100...
I.e. the number is an infinite fraction in binary. It is not possible to represent it exactly. (In the same way that it is not possible to represent 1/3 exactly with decimal numbers)
Rewrite the number to some form of binary scientific notation:
10 ^ 101 * 1.1101111011011101011101111111101011100011011000001000110100001000100...
Remember that this is still in binary, so the 10 ^ 101 corresponds to 2 ^ 5 in decimal notation.
Now... A float value can store 23 bits in the mantissa. If we round it up using "round to nearest" rounding mode, we get:
10 ^ 101 * 1.11011110110111010111100
Which is equal to:
111011.110110111010111100
That is all the precision that can fit into the float data type. Now convert that back to decimal:
59.8581390380859375
Seems pretty close to 59.858139 actually... But that is just luck. What happens if we convert the second closest float value to binary instead?
111011.110110111010111011 = 59.858135223388671875
So basically the resolution is approximately 0.000004.
So all we can really know from the float value is that the number is something like: 59.858139 ± 0.000002
It could just as well be 59.858137 or 59.858141.
Since the last digit is rather uncertain, I am guessing that the printing code is smart enough to understand that the last digit falls outside the precision of a float value, and hence, the value is rounded to 59.85814.
By the way, if you (like me are) are too lazy to convert between binary and decimal fractions by hand, you can use this converter. If you want to read more about the details of the floating point system, the wikipedia page for floating point representation is a great resource.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.