min value of float in java is positive why? [duplicate] - java

This question already has answers here:
Why is Double.MIN_VALUE in not negative
(6 answers)
Closed 6 years ago.
when we use MIN_VALUE function on either of primitive types in java it give us minimum value possible for that type.
BUT
in case of float and double it returned minimum positive value though float and double can have negative values also.

MIN_VALUE tells you how precise a float can get, but not the mathematical minimum it can represent. They should have named it better...
Negative of MAX_VALUE is the mathematical minimum value for floats(same goes for double).
The reason you can assume this has to do with how numbers are represented in binary:
Java float and doubles use sign bit to represent negative/positive values as opposed to two's complement representation for integers. This means it's positive and negative value have the same magnitude, ie. -(positive max) = negative max. Therefore you don't need to define an actual mathematical minimum constant for floats because you can just use the positive max constant to figure out the what the negative bound is. For integers, you need 2 different constants defining max/min because of the way they represented in binary, i.e. -max != min.
For more info http://people.uncw.edu/tompkinsj/133/numbers/Reals.htm

MIN_VALUE should be named EPSILON it's the smallest postive value a float can represent.
Because a float uses the sign-magnitude encoding, the lowest value a float can represent is -MAX_VALUE.

A possible explanation could be that Java just used the same naming convention as C++, which again inherited the names from C.
Java was influenced by C++, which shares the same confusing naming pattern. In C++, the analogy of Float.MIN_VALUE is std::numeric_limits<T>::min(), which is defined as:
Minimum finite value.
For floating types with denormalization (variable number of exponent bits): minimum positive normalized value.
In C++ that is a potential source of bugs in template code, so later in C++11, they added std::numeric_limits<T>::lowest(), which is defined as:
Minimum finite value. (since C++11)
For integral types: the same as min().
For floating-point types: implementation-dependent; generally, the negative of max().
But C++ was not the first language. It all goes back to C, which defines FLT_MIN the minimal floating point value.
So, why did C choose to define the minimums of floating point numbers and integers inconsistently?
Not sure, but it could have to do with symmetry (see this answer). For floats, you can use -FLT_MAX (or -Float.MAX_VALUE). For integers, negating the maximum value is not portable. In fact, it is generally wrong on all modern architectures (where -INT_MAX == INT_MIN + 1 should hold).

min value of float in java is positive why?
Why they chose to name the constants like that is unanswerable (by us) because nobody on SO was in the room when the decision was made.
Besides, knowing the answer to the question is not going to help, because the values of Float.MIN_VALUE and Double.MIN_VALUE won't be changed, no matter how "wrong" they might be. (It would break any existing code that uses these constants, and the Java designers only do that when there is no other viable alternative. Leaving it alone is clearly a viable alternative.)
I suppose, the answer (i.e. the real reason for the decision) might be relevant to people developing brand new programming languages. However, they are going to have to make up their own minds anyway. FWIW, I wouldn't have designed it this way, but that's not relevant.

Related

How accurate is "double-precision floating-point format"?

Let's say, using java, I type
double number;
If I need to use very big or very small values, how accurate can they be?
I tried to read how doubles and floats work, but I don't really get it.
For my term project in intro to programming, I might need to use different numbers with big ranges of value (many orders of magnitude).
Let's say I create a while loop,
while (number[i-1] - number[i] > ERROR) {
//does stuff
}
Does the limitation of ERROR depend on the size of number[i]? If so, how can I determine how small can ERROR be in order to quit the loop?
I know my teacher explained it at some point, but I can't seem to find it in my notes.
Does the limitation of ERROR depend on the size of number[i]?
Yes.
If so, how can I determine how small can ERROR be in order to quit the loop?
You can get the "next largest" double using Math.nextUp (or the "next smallest" using Math.nextDown), e.g.
double nextLargest = Math.nextUp(number[i-1]);
double difference = nextLargest - number[i-1];
As Radiodef points out, you can also get the difference directly using Math.ulp:
double difference = Math.ulp(number[i-1]);
(but I don't think there's an equivalent method for "next smallest")
If you don't tell us what you want to use it for, then we cannot answer anything more than what is standard knowledge: a double in java has about 16 significant digits, (that's digits of the decimal numbering system,) and the smallest possible value is 4.9 x 10-324. That's in all likelihood far higher precision than you will need.
The epsilon value (what you call "ERROR") in your question varies depending on your calculations, so there is no standard answer for it, but if you are using doubles for simple stuff as opposed to highly demanding scientific stuff, just use something like 1 x 10-9 and you will be fine.
Both the float and double primitive types are limited in terms of the amount of data they can store. However, if you want to know the maximum values of the two types, then run the code below with your favourite IDE.
System.out.println(Float.MAX_VALUE);
System.out.println(Double.MAX_VALUE);
double data type is a double-precision 64-bit IEEE 754 floating point (digits of precision could be between 15 to 17 decimal digits).
float data type is a single-precision 32-bit IEEE 754 floating point (digits of precision could be between 6 to 9 decimal digits).
After running the code above, if you're not satisfied with their ranges than I would recommend using BigDecimal as this type doesn't have a limit (rather your RAM is the limit).

Why does new BigDecimal("0.015").compareTo(new BigDecimal(0.015)) return -1? [duplicate]

This question already has answers here:
Why are floating point numbers inaccurate?
(5 answers)
BigDecimal compareTo not working as expected
(1 answer)
Closed 7 years ago.
Why does new BigDecimal("0.015").compareTo(new BigDecimal(0.015)) return -1?
If I expect those two to be equal, is there an alternative way to compare them?
Due to the imprecise nature of floating point arithmetic, they're not exactly equal
System.out.println(new BigDecimal(0.015));
displays
0.01499999999999999944488848768742172978818416595458984375
To expand on the answer from #Reimeus, the various constructors for BigDecimal accept different types of input. The floating point constructors, take a floating point as input, and due to the limitations of the way that floats/doubles are stored, these can only store accurately values that are a power of 2.
So, for example, 2⁻², or 0.25, can be represented exactly. 0.875 is (2⁻¹ + 2⁻² + 2⁻³), so it can also be represented accurately. So long as the number can be represented by a sum of powers, where the upper and lower power differ by no more than 53, then the number can be represented exactly. The vast majority of numbers don't fit this pattern!
In particular, 0.15 is not a power of two, nor is it the sum of a power of two, and so the representation is not accurate.
The string constructor on the other hand does store it accurately, by using a different format internally to store the number. Hence, when you compare the two, they compare as being different.
A double cannot exactly represent the value 0.015. The closest value it can represent in its 64 binary bits is 0.01499999999999999944488848768742172978818416595458984375. The constructor new BigDecimal(double) is designed to preserve the precise value of the double argument, which can never be exactly 0.015. Hence the result of your comparison.
However, if you display that double value, for example by:
System.out.println(0.01499999999999999944488848768742172978818416595458984375);
it outputs 0.015 – which hints at a workaround. Converting a double to a String chooses the shortest decimal representation needed to distinguish it from other possible double values.
Thus, if you create a BigDecimal from the double's String representation, it will have a value more as you expect. This comparison is true:
new BigDecimal(Double.toString(0.015)).equals(new BigDecimal("0.015"))
In fact, the method BigDecimal.valueOf(double) exists for exactly this purpose, so you can shorten the above to:
BigDecimal.valueOf(0.015).equals(new BigDecimal("0.015"))
You should use the new BigDecimal(double) constructor only if your purpose is to preserve the precise binary value of the argument. Otherwise, call BigDecimal.valueOf(double), whose documentation says:
This is generally the preferred way to convert a double (or float) into a BigDecimal.
Or, use a String if you can and avoid the subtleties of double entirely.
What actually happens here is this:
0.015 is a primitive double. Which means that as soon as you write it, it is already no longer 0.015, but rather 0.0149.... The compiler stores it as a binary representation in the bytecode.
BigDecimal is constructed to store exactly whatever is given to it. In this case, 0.0149...
BigDecimal is also able to parse Strings into exact representations. In this case "0.015" is parsed into exactly 0.015. Even though double cannot represent that number, BigDecimal can
Finally, when you compare them, you can see that they are not equal. Which makes sense.
Whenever using BigDecimal, be cautious of the previously used type. String, int, long will remain exact. float and double have the usual precision caveat.

can someone please explain why a double is called a double in Java? [duplicate]

I'm extremely new to Java and just wanted to confirm what Double is? Is it similar to Float or Int? Any help would be appreciated. I also sometimes see the uppercase Double and other times the lower case double. If someone could clarify what this means that'd be great!
Double is a wrapper class,
The Double class wraps a value of the
primitive type double in an object. An
object of type Double contains a
single field whose type is double.
In addition, this class provides
several methods for converting a
double to a String and a String to a
double, as well as other constants and
methods useful when dealing with a
double.
The double data type,
The double data type is a
double-precision 64-bit IEEE 754
floating point. Its range of values is
4.94065645841246544e-324d to 1.79769313486231570e+308d (positive or negative).
For decimal values, this data type is
generally the default choice. As
mentioned above, this data type should
never be used for precise values, such
as currency.
Check each datatype with their ranges : Java's Primitive Data Types.
Important Note : If you'r thinking to use double for precise values, you need to re-think before using it. Java Traps: double
In a comment on #paxdiablo's answer, you asked:
"So basically, is it better to use Double than Float?"
That is a complicated question. I will deal with it in two parts
Deciding between double versus float
On the one hand, a double occupies 8 bytes versus 4 bytes for a float. If you have many of them, this may be significant, though it may also have no impact. (Consider the case where the values are in fields or local variables on a 64bit machine, and the JVM aligns them on 64 bit boundaries.) Additionally, floating point arithmetic with double values is typically slower than with float values ... though once again this is hardware dependent.
On the other hand, a double can represent larger (and smaller) numbers than a float and can represent them with more than twice the precision. For the details, refer to Wikipedia.
The tricky question is knowing whether you actually need the extra range and precision of a double. In some cases it is obvious that you need it. In others it is not so obvious. For instance if you are doing calculations such as inverting a matrix or calculating a standard deviation, the extra precision may be critical. On the other hand, in some cases not even double is going to give you enough precision. (And beware of the trap of expecting float and double to give you an exact representation. They won't and they can't!)
There is a branch of mathematics called Numerical Analysis that deals with the effects of rounding error, etc in practical numerical calculations. It used to be a standard part of computer science courses ... back in the 1970's.
Deciding between Double versus Float
For the Double versus Float case, the issues of precision and range are the same as for double versus float, but the relative performance measures will be slightly different.
A Double (on a 32 bit machine) typically takes 16 bytes + 4 bytes for the reference, compared with 12 + 4 bytes for a Float. Compare this to 8 bytes versus 4 bytes for the double versus float case. So the ratio is 5 to 4 versus 2 to 1.
Arithmetic involving Double and Float typically involves dereferencing the pointer and creating a new object to hold the result (depending on the circumstances). These extra overheads also affect the ratios in favor of the Double case.
Correctness
Having said all that, the most important thing is correctness, and this typically means getting the most accurate answer. And even if accuracy is not critical, it is usually not wrong to be "too accurate". So, the simple "rule of thumb" is to use double in preference to float, UNLESS there is an overriding performance requirement, AND you have solid evidence that using float will make a difference with respect to that requirement.
A double is an IEEE754 double-precision floating point number, similar to a float but with a larger range and precision.
IEEE754 single precision numbers have 32 bits (1 sign, 8 exponent and 23 mantissa bits) while double precision numbers have 64 bits (1 sign, 11 exponent and 52 mantissa bits).
A Double in Java is the class version of the double basic type - you can use doubles but, if you want to do something with them that requires them to be an object (such as put them in a collection), you'll need to box them up in a Double object.

Java double epsilon

I'm currently in the need of an epsilon of type double (preferred are constants in java's libraries instead of own implementations/definitions)
As far as I can see Double has MIN_VALUE and MAX_VALUE as static members.
Why there is no EPSILON?
What would a epsilon<double> be?
Are there any differences to a std::numeric_limits< double >::epsilon()?
Epsilon: The difference between 1 and the smallest value greater than 1 that is representable for the data type.
I'm presuming you mean epsilon in the sense of the error in the value. I.e this.
If so then in Java it's referred to as ULP (unit in last place). You can find it by using the java.lang.Math package and the Math.ulp() method. See javadocs here.
The value isn't stored as a static member because it will be different depending on the double you are concerned with.
EDIT: By the OP's definition of epsilon now in the question, the ULP of a double of value 1.0 is 2.220446049250313E-16 expressed as a double. (I.e. the return value of Math.ulp(1.0).)
By the edit of the question, explaining what is meant by EPSILON, the question is now clear, but it might be good to point out the following:
I believe that the original question was triggered by the fact that in C there is a constant DBL_EPSILON, defined in the standard header file float.h, which captures what the question refers to. The same standard header file contains definitions of constants DBL_MIN and DBL_MAX, which clearly correspond to Double.MIN_VALUE and Double.MAX_VALUE, respectively, in Java. Therefore it would be natural to assume that Java, by analogy, should also contain a definition of something like Double.EPSILON with the same meaning as DBL_EPSILON in C. Strangely, however, it does not. Even more strangely, C# does contain a definition double.EPSILON, but it has a different meaning, namely the one that is covered in C by the constant DBL_MIN and in Java by Double.MIN_VALUE. Certainly a situation that can lead to some confusion, as it makes the term EPSILON ambiguous.
Without using Math package:
Double.longBitsToDouble(971l << 52)
That's 2^-52 (971 = 1023(double exponent bias) - 52, shift by 52 is because mantissa is stored on the first 52 bits).
It's a little quicker than Math.ulp(1.0);
Also, if you need this to compare double values, there's a really helpful article: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/
double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
looking up at IEEE 754 you'll find the precision of epsion...
http://en.wikipedia.org/wiki/IEEE_floating_point
binary64:
Base(b)=2
precision(p)=53
machineEpsion(e) (b^-(p-1))/2=2^-53=1.11e-16
machineEpsilon(e) b^-(p-1)=2^-52=2.22e-16

Float vs Double

Is there ever a case where a comparison (equals()) between two floating point values would return false if you compare them as DOUBLE but return true if you compare them as FLOAT?
I'm writing some procedure, as part of my group project, to compare two numeric values of any given types. There're 4 types I'd have to deal with altogether : double, float, int and long. So I'd like to group double and float into one function, that is, I'd just cast any float to double and do the comparison.
Would this lead to any incorrect results?
Thanks.
If you're converting doubles to floats and the difference between them is beyond the precision of the float type, you can run into trouble.
For example, say you have the two double values:
9.876543210
9.876543211
and that the precision of a float was only six decimal digits. That would mean that both float values would be 9.87654, hence equal, even though the double values themselves are not equal.
However, if you're talking about floats being cast to doubles, then identical floats should give you identical doubles. If the floats are different, the extra precision will ensure the doubles are distinct as well.
As long as you are not mixing promoted floats and natively calculated doubles in your comparison you should be ok, but take care:
Comparing floats (or doubles) for equality is difficult - see this lengthy but excellent discussion.
Here are some highlights:
You can't use ==, because of problems with the limited precision of floating point formats
float(0.1) and double(0.1) are different values (0.100000001490116119384765625 and 0.1000000000000000055511151231257827021181583404541015625) respectively. In your case, this means that comparing two floats (by converting to double) will probably be ok, but be careful if you want to compare a float with a double.
It's common to use an epsilon or small value to make a relative comparison with (floats a and b are considered equal if a - b < epsilon). In C, float.h defines FLT_EPSILON for exactly this purpose. However, this type of comparison doesn't work where a and b are both very small, or both very large.
You can address this by using a scaled-relative-to-the-sizes-of-a-and-b epsilon, but this breaks down in some cases (like comparisons to zero).
You can compare the integer representations of the floating point numbers to find out how many representable floats there are between them. This is what Java's Float.equals() does. This is called the ULP difference, for "Units in Last Place" difference. It's generally good, but also breaks down when comparing against zero.
The article concludes:
Know what you’re doing
There is no silver bullet. You have to choose wisely.
If you are comparing against zero, then relative epsilons and ULPs based comparisons are usually meaningless. You’ll need to use an absolute epsilon, whose value might be some small multiple of FLT_EPSILON and the inputs to your calculation. Maybe.
If you are comparing against a non-zero number then relative epsilons or ULPs based comparisons are probably what you want. You’ll probably want some small multiple of FLT_EPSILON for your relative epsilon, or some small number of ULPs. An absolute epsilon could be used if you knew exactly what number you were comparing against.
If you are comparing two arbitrary numbers that could be zero or non-zero then you need the kitchen sink. Good luck and God speed.
So, to answer your question:
If you are downgrading doubles to floats, then you might lose precision, and incorrectly report two different doubles as equal (as paxdiablo points out.)
If you are upgrading identical floats to double, then the added precision won't be a problem unless you are comparing a float with a double (Say you'd got 1.234 in float, and you only had 4 decimal digits of accuracy, then the double 1.2345 MIGHT represent the same value as the float. In this case you'd probably be better to do the comparison at the precision of the float, or more generally, at the error level of the most inaccurate representation in the comparison).
If you know the number you'll be comparing with, you can follow the advice quoted above.
If you're comparing arbitrary numbers (which could be zero or non-zero), there's no way to compare them correctly in all cases - pick one comparison and know its limitations.
A couple of practical considerations (since this sounds like it's for an assignment):
The epsilon comparison mentioned by most is probably fine (but include a discussion of the limitations in the write up). If you're ever planning to compare doubles to floats, try to do it in float, but if not, try to do all comparisons in double. Even better, just use doubles everywhere.
If you want to totally ace the assignment, include a write-up of the issues when comparing floats and the rationale for why you chose any particular comparison method.
I don't understand why you're doing this at all. The == operator already caters for all possible types on both sides, with extensive rules on type coercion and widening which are already specified in the relevant language standards. All you have to do is use it.
I'm perhaps not answering the OP's question but rather responding to some more or less fuzzy advice which require clarifications.
Comparing two floating point values for equality is absolutely possible and can be done. If the type is single or double precision is often of less importance.
Having said that the steps leading up to the comparison itself require great care and a thorough understanding of floating-point dos and don'ts, whys and why nots.
Consider the following C statements:
result = a * b / c;
result = (a * b) / c;
result = a * (b / c);
In most naive floating-point programming they are seen as "equivalent" i e producing the "same" result. In the real world of floating-point they may be. Or actually, the first two are equivalent (as the second follows C evaluation rules, i e operators of same priority left to right). The third may or may not be equivalent to the first twp.
Why is this?
"a * b / c" or "b / c * a" may cause the "inexact" exception i e an intermediate or the final result (or both) is (are) not exact(ly representable in floating point format). If this is the case the results will be more or less subtly different. This may or may not lead to the end results being amenable to an equality comparison. Being aware of this and single-stepping through operations one at a time - noting intermediate results - will allow the patient programmer to "beat the system" i e construct a quality floating-point comparison for practically any situation.
For everyone else, passing over the equality comparison for floating-poiny numbers is good, solid advice.
It's really a bit ironic because most programmers know that integer math results in predictable truncations in various situations. When it comes to floating-point almost everyone is more or less thunderstruck that results are not exact. Go figure.
You should be okay to make that cast as long as the equality test involves a delta.
For example: abs((double) floatVal1 - (double) floatVal2) < .000001 should work.
Edit in response to the question change
No you would not. The above still stands.
For the comparison between float f and double d, you can calculate the difference of f and d. If abs(f-d) is less than some threshold, you can think of the equality holds. These threshold could be either absolute or relative as your application requirement. There are some good solutions Here. And I hope it helpful.
Would I ever get an incorrect result if I promote 2 floats to
double and do a 64bit comparison rather than a 32bit comparison?
No.
If you start with two floats, which could be float variables (float x = foo();) or float constants (1.234234234f) then you can compare them directly, of course. If you convert them to double and then compare them then the results will be identical.
This works because double is a super-set of float. That is, every value that can be stored in a float can be stored in a double. The range of the exponent and mantissa are both increased. There are billions of values that can be stored in a double but not in a float, but there are zero values that can be stored in a float but not a double.
As discussed in my float comparison article it can be tricky to do a meaningful comparison between float or double values, because rounding errors may have crept in. But, converting both numbers from float to double doesn't not change this. All of the mentions of epsilons (which are often but not always needed) are completely orthogonal to the question.
On the other hand, comparing a float to a double is madness. 1.1 (a double) is not equal to 1.1f (a float) because 1.1 cannot be exactly represented in either.

Categories

Resources