Java and the size of a double - java

I am currently learning Java and in my book there's written a double can range from ~4,94E-324 to ~1,798E+308
I have some questions to that: How can a double then be zero (0)?
Like double d = 0;
Why is that possible when a double doesn't have a range including zero? And why are the negative numbers missing? Am I forgetting something here? Thanks for your help!
Sincerely,
Maxi

Doubles and floats in most programming languages are more complicated than that. You can find a more technical explanation by looking up the IEEE 754 standard for floating point numbers.
Basically, floating point numbers (in java, the variable types float and double) are effectively stored in scientific notation, with a sign, mantissa, and exponent. (The base for the exponent is always 2). How this is converted into binary format is a bit complicated, but the important part to know is that the numbers are effectively stored as +/- mantissa * 2^exponent.
The mantissa and exponent, however, both have fixed ranges. What your textbook is talking about is the range of values that are possible based on the range of the exponent. That is, how large or small can values be if you choose the largest or smallest possible exponent, ignoring the sign of the value. Zero is also ignored for this case, because in scientific notation, zero is a trivial case which does not illustrate the available range of exponents.
Doubles have about 15-16 digits worth of precision, that is, you can represent numbers with a mantissa of length 15-16 digits, regardless of the exponent. Regardless of the mantissa, you can represent numbers ranging from about 10^-324 to about 10^308. And regardless of the mantissa and exponent, you can represent both positive and negative values.

I think you're interpreting the range in a mathematical sense. What your book means by range is how small of a finite number and how large of a finite number Double can produce, both for negative and positive values. So basically how close it can get to 0 with finite values and how close it can get to infinite with finite values. The actual range in a mathematical sense of Double is something like -1.7*10^308 to 1.7*10^308.
The Double class has members which contain it's Min and Max value. Take a look at Min_VALUE and Max_VALUE members for Double. Really the mathematical range is a byproduct of the range in your book ([-Double.MAX_VALUE, Double.MAX_VALUE]) , which is a result of how many degrees of accuracy Double can hold.

Related

I am getting the answer 0.0 of this java code.why I am getting this? [duplicate]

Can anyone shed some light on why Double.MIN_VALUE is not actually the minimum value that Doubles can take? It is a positive value, and a Double can of course be negative.
I understand why it's a useful number, but it seems a very unintuitive name, especially when compared to Integer.MIN_VALUE. Calling it Double.SMALLEST_POSITIVE or MIN_INCREMENT or similar would have clearer semantics.
Also, what is the minimum value that Doubles can take? Is it -Double.MAX_VALUE? The docs don't seem to say.
The IEEE 754 format has one bit reserved for the sign and the remaining bits representing the magnitude. This means that it is "symmetrical" around origo (as opposed to the Integer values, which have one more negative value). Thus the minimum value is simply the same as the maximum value, with the sign-bit flipped, so yes, -Double.MAX_VALUE is the lowest actual number you can represent with a double.
I suppose the Double.MAX_VALUE should be seen as maximum magnitude, in which case it actually makes sense to simply write -Double.MAX_VALUE. It also explains why Double.MIN_VALUE is the least positive value (since that represents the least possible magnitude).
But sure, I agree that the naming is a bit misleading. Being used to the meaning Integer.MIN_VALUE, I too was a bit surprised when I read that Double.MIN_VALUE was the smallest absolute value that could be represented. Perhaps they thought it was superfluous to have a constant representing the least possible value as it is simply a - away from MAX_VALUE :-)
(Note, there is also Double.NEGATIVE_INFINITY but I'm disregarding from this, as it is to be seen as a "special case" and does not in fact represent any actual number.)
Here is a good text on the subject.
These constants have nothing to do with sign. This makes more sense if you consider a double as a composite of three parts: Sign, Exponent and Mantissa.
Double.MIN_VALUE is actually the smallest value Mantissa can assume when the Exponent is at minimun value before a flush to zero occurs. Likewise MAX_VALUE can be understood as the largest value Mantissa can assume when the Exponent is at maximum value before a flush to infinity occurs.
A more descriptive name for these two could be Largest Absolute (add non-zero for verbositiy) and Smallest Absolute value (add non-infinity for verbositiy).
Check out the IEEE 754 (1985) standard for details. There is a revised (2008) version, but that only introduces more formats which aren't even supported by java (strictly speaking java even lacks support for some mandatory features of IEEE 754 1985, like many other high level languages).
I assume the confusing names can be traced back to C, which defined FLT_MIN as the smallest positive number.
Like in Java, where you have to use -Double.MAX_VALUE, you have to use -FLT_MAX to get the smallest float in C.
The minimum value for a double is Double.NEGATIVE_INFINITY that's why Double.MIN_VALUE isn't really the minimum for a Double.
As the double are floating point numbers, you can only have the biggest number (with a lower precision) or the closest number to 0 (with a great precision).
If you really want a minimal value for a double that isn't infinity then you can use -Double.MAX_VALUE.
Because with floating point numbers, the precision is what is important as there's no exact range.
/**
* A constant holding the smallest positive nonzero value of type
* <code>double</code>, 2<sup>-1074</sup>. It is equal to the
* hexadecimal floating-point literal
* <code>0x0.0000000000001P-1022</code> and also equal to
* <code>Double.longBitsToDouble(0x1L)</code>.
*/
But i agree that it should probably have been named something better :)
As it says in the documents,
Double.MIN_VALUE is a constant holding the smallest POSITIVE nonzero value of type double, 2^(-1074).
The trick here is we are talking about a floating point number representation. The double data type is a double-precision 64-bit IEEE 754 floating point. Floating points represent numbers from 1,000,000,000,000 to 0.0000000000000001 with ease, and while maximizing precision (the number of digits) at both ends of the scale. (For more refer this)
The mantissa, always a positive number, holds the significant digits of the floating-point number. The exponent indicates the positive or negative power of the radix that the mantissa and sign should be multiplied by. The four components are combined as follows to get the floating-point value.
Think that the MIN_VALUE is the minimum value that the mantissa can represent. As the minimum values of a floating point representation is the minimum magnitude that can be represented using that. (Could have used a better name to avoid this confusion though)
123 > 10 > 1 > 0.12 > 0.012 > 0.0000123 > 0.000000001 > 0.0000000000000001
Below is just FYI.
Double-precision floating-point can represent 2,098 powers of two, from 2^-1074 through 2^1023. Denormalized powers of two are those from 2^-1074 through 2^-1023; normalized powers of two are those from 2^-1022 through 2^1023. Refer this and this.

Floating and Double types range in java

I am beginner in Java and I am just reading one of the pdfs for beginners like me. So in my book I found this:
So for example the floating point numbers may be within the range of
1.4E-45 to 3.4028235E+38
So according to my math, that number can be very small (near the zero) or quite large, but it CAN NOT be a negative number.
Am I correct?
The book states the MIN_VALUE and MAX_VALUE for the floating point types. This range describes available precision but it is certainly not the case that all values must fall between MIN_VALUE and MAX_VALUE as you can easily confirm by assigning zero or a negative number to a float variable.
Floating point values (float and double) can be one of the following:
NaN (not a number)
negative infinity
a negative number between -MAX_VALUE and -MIN_VALUE
negative zero
positive zero
a positive number between MIN_VALUE and MAX_VALUE
positive infinity
Float range is approximately ±3.40282347E+38F
(6-7 significant decimal digits)
Java implements IEEE 754 standard.
Refer below links
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
http://cs-fundamentals.com/java-programming/java-primitive-data-types.php
here you are considering the minimimum value of float .
minimum value of float (MIN_VALUE) prints the most accurate float it can get, but not the mathematical minimum it can represent.
integers can hold negative values and integers can be cast to float later.
think about that.
range of float : 32 bits -3.4E+38 to +3.4E+38

Are there limits of precision on the memory capacity of double?

I'm reading an introductory book on Java and ran into something I don't quite understand. In covering variable types, the author states "the word double stands for numbers between
This struck me as odd, since as written, it would include all numbers on the real number line between the two aforementioned limits.
From my understanding, double is a primitive data type assigned 64 bits of memory. In turn, it's clear to me that 5.9 is a perfectly fine double, or float for that matter. However, I'm not sure how the series
i.e., 5.9, 5.99, 5.999, 5.9999, ... would fit in memory as k approaches infinity.
Is my intuition correct that not all real numbers between the author's two limits would be appropriately held in memory as a double?
Is my intuition that not all real numbers between the author's two limits would be appropriately held in memory as a double?
Yes, you are right.
Even the MOST obvious "doubles" cannot be stored correctly. For instance 0.1 is "1/10" - have you ever divided by ten in a base-2 system? That's an infinite number (comparable to "/3" in the base 10 system)
(This fact btw was responsible for the Patriot-Bug: http://sydney.edu.au/engineering/it/~alum/patriot_bug.html)
And therefore even some obvious easy maths will go wrong on java:
Take the compiler of your choice and try this:
System.out.println(0.8 == (0.1 + 0.7));
Whoops, it will output false.
Indeed. In one sentence, ints are exact, while floats and doubles are stored using scientific notation notated in bits. This means that there will be a roundoff error, as scientific notation goes.
As per wikipedia:
Sign bit: 1 bit
Exponent width: 11 bits
Significand precision: 53 bits (52 explicitly stored)
An interesting note: the exponent has 1 bit storing its sign also!
To read more:
Wikipedia - Double precision floating point format
The double data type is a double-precision 64-bit(8 bytes) IEEE-754 floating point.The format consists of 1-bit for sign, 11-bits for exponent and the remaining 52 bits of the significand represent the fraction part. With the 52 bits of the fraction significand appearing in the memory format, the total precision is therefore 53 bits (approximately 16 decimal digits, 53 log10(2) ≈ 15.955). The 11 bit width of the exponent allows the representation of numbers with a decimal exponent between 10E−308 and 10E+308, with full 15–17 decimal digits precision. Double and float are not exactly real numbers.There can be infinite number of real numbers in any range, but it should always be kept in mind that there are only finite number of bits to represent them and hence not all numbers could be represented.
For higher and better precision, you can use BigDecimal class found in the java.math package.
http://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html

Precision loss with java.lang.Double

Say I have 2 double values. One of them is very large and one of them is very small.
double x = 99....9; // I don't know the possible max and min values,
double y = 0,00..1; // so just assume these values are near max and min.
If I add those values together, do I lose precision?
In other words, does the max possible double value increase if I assign an int value to it? And does the min possible double value decrease if I choose a small integer part?
double z = x + y; // Real result is something like 999999999999999.00000000000001
double values are not evenly distributed over all numbers. double uses the floating point representation of the number which means you have a fixed amount of bits used for the exponent and a fixed amount of bits used to represent the actual "numbers"/mantissa.
So in your example using a large and a small value would result in dropping the smaller value since it can not be expressed using the larger exponent.
The solution to not dropping precision is using a number format that has a potentially growing precision like BigDecimal - which is not limited to a fixed number of bits.
I'm using a decimal floating point arithmetic with a precision of three decimal digits and (roughly) with the same features as the typical binary floating point arithmetic. Say you have 123.0 and 4.56. These numbers are represented by a mantissa (0<=m<1) and an exponent: 0.123*10^3 and 0.456*10^1, which I'll write as <.123e3> and <.456e1>. Adding two such numbers isn't immediately possible unless the exponents are equal, and that's why the addition proceeds according to:
<.123e3> <.123e3>
<.456e1> <.004e3>
--------
<.127e3>
You see that the necessary alignment of the decimal digits according to a common exponent produces a loss of precision. In the extreme case, the entire addend could be shifted into nothingness. (Think of summing an infinite series where the terms get smaller and smaller but would still contribute considerably to the sum being computed.)
Other sources of imprecision result from differences between binary and decimal fractions, where an exact fraction in one base cannot be represented without error using the other one.
So, in short, addition and subtraction between numbers from rather different orders of magnitude are bound to cause a loss of precision.
If you try to assign too big value or too small value a double, compiler will give an error:
try this
double d1 = 1e-1000;
double d2 = 1e+1000;

Determine smallest floating point type that can hold a string value

I'm working on a method that translates a string into an appropriate Number type, depending upon the format of the number. If the number appears to be a floating point value, then I need to return the smallest type I can use without sacrificing precision (Float, Double or BigDecimal).
Based on How many significant digits have floats and doubles in java? (and other resources), I've learned than Float values have 23 bits for the mantissa. Based on this, I used the following method to return the bit length for a given value:
private static int getBitLengthOfSignificand(String integerPart,
String fractionalPart) {
return new BigInteger(integerPart + fractionalPart).bitLength();
}
If the result of this test is below 24, I return a Float. If below 53 I return a Double, otherwise a BigDecimal.
However, I'm confused by the result when I consider Float.MAX_VALUE, which is 3.4028235E38. The bit length of the significand is 26 according to my method (where integerPart = 3 and fractionalPart = 4028235. This triggers my method to return a Double, when clearly Float would suffice.
Can someone highlight the flaw in my thinking or implementation? Another idea I had was to convert the string to a BigDecimal and scale down using floatValue() and doubleValue(), testing for overflow (which is represented by infinite values). But that loses precision, so isn't appropriate for me.
The significand is stored in binary, and you can think of it as a number in its decimal representation only if you don't let it confuse you.
The exponent is a binary exponent that does not represent a multiplication by a power of ten but by a power of two. For this reason, the E38 in the number you used as example is only a convenience: the real significand is in binary and should be multiplied by a power of two to obtain the actual number. Powers of two and powers of ten aren't the same, so “3.4028235” is not the real significand.
The real significand of Float.MAX_VALUE is in hexadecimal notation, 0x1.fffffe, and its associated exponent is 127, meaning that Float.MAX_VALUE is actually 0x1.fffffe * 2127.
Looking at the decimal representation to choose a binary floating-point type to put the value in, as you are trying to do, doesn't work. For one thing, the number of decimal digits that one is sure to recover from a float is different from the number of decimal digits one may need to write to distinguish a float from its neighbors (6 and 9 respectively). You chose to write “3.4028235E38” but you could have written 3.40282E38, which for your algorithm, looks easier to represent, when it isn't, really. When people write that “3.4028235E38” is the largest finite value of the float type, they mean that if you round this decimal number to float, you will arrive to the largest float. If you parse “3.4028235E38” as a double-precision number it won't even be equal to Float.MAX_VALUE.
To put it differently: another way to write Float.MAX_VALUE is 3.4028234663852885981170418348451692544E38. It is still representable as a float (it represents the exact same value as 3.4028235E38). It looks like it has many digits because these are decimal digits that appear for a decimal exponent, when in fact the number is represented internally with a binary exponent.
(By the way, your approach does not check that the exponent is in range to represent a number in the chosen type, which is another condition for a type to be able to represent the number from a string.)
I would work in terms of the difference between the actual value and the nearest float. BigDecimal can store any finite length decimal fraction exactly and do arithmetic on it:
Convert the String to the nearest float x. If x is infinite, but the value has a finite double representation use that.
Convert the String exactly to BigDecimal y.
If y is zero, use float, which can represent zero exactly.
If not, convert the float x to BigDecimal, z.
Calculate, in BigDecimal to a reasonable number of decimal places, the absolute value of (y-z)/z. That is the relative rounding error due to using float. If it is small enough for your purposes, less than some value you pick, use float. If not, use double.
If you literally want no sacrifice in precision, it is much simpler. Convert to both float and double. Compare them for equality. The comparison will be done in double. If they compare equal, go with the float. If not, go with the double.

Categories

Resources