Java double epsilon - java

I'm currently in the need of an epsilon of type double (preferred are constants in java's libraries instead of own implementations/definitions)
As far as I can see Double has MIN_VALUE and MAX_VALUE as static members.
Why there is no EPSILON?
What would a epsilon<double> be?
Are there any differences to a std::numeric_limits< double >::epsilon()?
Epsilon: The difference between 1 and the smallest value greater than 1 that is representable for the data type.

I'm presuming you mean epsilon in the sense of the error in the value. I.e this.
If so then in Java it's referred to as ULP (unit in last place). You can find it by using the java.lang.Math package and the Math.ulp() method. See javadocs here.
The value isn't stored as a static member because it will be different depending on the double you are concerned with.
EDIT: By the OP's definition of epsilon now in the question, the ULP of a double of value 1.0 is 2.220446049250313E-16 expressed as a double. (I.e. the return value of Math.ulp(1.0).)

By the edit of the question, explaining what is meant by EPSILON, the question is now clear, but it might be good to point out the following:
I believe that the original question was triggered by the fact that in C there is a constant DBL_EPSILON, defined in the standard header file float.h, which captures what the question refers to. The same standard header file contains definitions of constants DBL_MIN and DBL_MAX, which clearly correspond to Double.MIN_VALUE and Double.MAX_VALUE, respectively, in Java. Therefore it would be natural to assume that Java, by analogy, should also contain a definition of something like Double.EPSILON with the same meaning as DBL_EPSILON in C. Strangely, however, it does not. Even more strangely, C# does contain a definition double.EPSILON, but it has a different meaning, namely the one that is covered in C by the constant DBL_MIN and in Java by Double.MIN_VALUE. Certainly a situation that can lead to some confusion, as it makes the term EPSILON ambiguous.

Without using Math package:
Double.longBitsToDouble(971l << 52)
That's 2^-52 (971 = 1023(double exponent bias) - 52, shift by 52 is because mantissa is stored on the first 52 bits).
It's a little quicker than Math.ulp(1.0);
Also, if you need this to compare double values, there's a really helpful article: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-numbers-2012-edition/

double: The double data type is a double-precision 64-bit IEEE 754 floating point. Its range of values is beyond the scope of this discussion, but is specified in the Floating-Point Types, Formats, and Values section of the Java Language Specification. For decimal values, this data type is generally the default choice. As mentioned above, this data type should never be used for precise values, such as currency.
http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
looking up at IEEE 754 you'll find the precision of epsion...
http://en.wikipedia.org/wiki/IEEE_floating_point
binary64:
Base(b)=2
precision(p)=53
machineEpsion(e) (b^-(p-1))/2=2^-53=1.11e-16
machineEpsilon(e) b^-(p-1)=2^-52=2.22e-16

Related

How to get the smallest next or previous possible double value supported by the architecture?

Lets say I have a double variable d. Is there a way to get the next or previous value that is supported by the CPU architecture.
As a trivial example, if the value was 10.1245125 and the precision of the architecture was fixed to 7 decimal places, then the next value would be 10.1245126 and the previous value would be 10.1245124.
Obviously on floating-point architectures this is not that simple. How would I be able to achieve this (in Java)?
Actually, an IEEE 754 floating-point architecture makes this easy: thanks to the standard, the function is called nextafter in nearly all languages that support it, and this uniformity allowed me to write an answer to your question with very little familiarity with Java:
The java.lang.Math.nextAfter(double start, double direction) returns the floating-point number adjacent to the first argument in the direction of the second argument.
Remember that -infinity and +infinity are floating-point values, and these values are convenient to give the direction (second argument). Do not make the common mistake of writing something like Math.nextAfter(x, x+1), which only works as long as 1 is greater than the ULP of x.
Anyone who writes the above probably means instead Math.nextAfter(x, Double.POSITIVE_INFINITY), which saves an addition and works for all values of x.
Math.nextUp and Math.nextDown can be used to get the next/previous element, which are equivalent to the proposed methods in the accepted answer, but more concise.
(this info has been originally provided as a comment by #BjörnZurmaar)

Wrong Output Dollar Amount To Coins [duplicate]

double r = 11.631;
double theta = 21.4;
In the debugger, these are shown as 11.631000000000000 and 21.399999618530273.
How can I avoid this?
These accuracy problems are due to the internal representation of floating point numbers and there's not much you can do to avoid it.
By the way, printing these values at run-time often still leads to the correct results, at least using modern C++ compilers. For most operations, this isn't much of an issue.
I liked Joel's explanation, which deals with a similar binary floating point precision issue in Excel 2007:
See how there's a lot of 0110 0110 0110 there at the end? That's because 0.1 has no exact representation in binary... it's a repeating binary number. It's sort of like how 1/3 has no representation in decimal. 1/3 is 0.33333333 and you have to keep writing 3's forever. If you lose patience, you get something inexact.
So you can imagine how, in decimal, if you tried to do 3*1/3, and you didn't have time to write 3's forever, the result you would get would be 0.99999999, not 1, and people would get angry with you for being wrong.
If you have a value like:
double theta = 21.4;
And you want to do:
if (theta == 21.4)
{
}
You have to be a bit clever, you will need to check if the value of theta is really close to 21.4, but not necessarily that value.
if (fabs(theta - 21.4) <= 1e-6)
{
}
This is partly platform-specific - and we don't know what platform you're using.
It's also partly a case of knowing what you actually want to see. The debugger is showing you - to some extent, anyway - the precise value stored in your variable. In my article on binary floating point numbers in .NET, there's a C# class which lets you see the absolutely exact number stored in a double. The online version isn't working at the moment - I'll try to put one up on another site.
Given that the debugger sees the "actual" value, it's got to make a judgement call about what to display - it could show you the value rounded to a few decimal places, or a more precise value. Some debuggers do a better job than others at reading developers' minds, but it's a fundamental problem with binary floating point numbers.
Use the fixed-point decimal type if you want stability at the limits of precision. There are overheads, and you must explicitly cast if you wish to convert to floating point. If you do convert to floating point you will reintroduce the instabilities that seem to bother you.
Alternately you can get over it and learn to work with the limited precision of floating point arithmetic. For example you can use rounding to make values converge, or you can use epsilon comparisons to describe a tolerance. "Epsilon" is a constant you set up that defines a tolerance. For example, you may choose to regard two values as being equal if they are within 0.0001 of each other.
It occurs to me that you could use operator overloading to make epsilon comparisons transparent. That would be very cool.
For mantissa-exponent representations EPSILON must be computed to remain within the representable precision. For a number N, Epsilon = N / 10E+14
System.Double.Epsilon is the smallest representable positive value for the Double type. It is too small for our purpose. Read Microsoft's advice on equality testing
I've come across this before (on my blog) - I think the surprise tends to be that the 'irrational' numbers are different.
By 'irrational' here I'm just referring to the fact that they can't be accurately represented in this format. Real irrational numbers (like π - pi) can't be accurately represented at all.
Most people are familiar with 1/3 not working in decimal: 0.3333333333333...
The odd thing is that 1.1 doesn't work in floats. People expect decimal values to work in floating point numbers because of how they think of them:
1.1 is 11 x 10^-1
When actually they're in base-2
1.1 is 154811237190861 x 2^-47
You can't avoid it, you just have to get used to the fact that some floats are 'irrational', in the same way that 1/3 is.
One way you can avoid this is to use a library that uses an alternative method of representing decimal numbers, such as BCD
If you are using Java and you need accuracy, use the BigDecimal class for floating point calculations. It is slower but safer.
Seems to me that 21.399999618530273 is the single precision (float) representation of 21.4. Looks like the debugger is casting down from double to float somewhere.
You cant avoid this as you're using floating point numbers with fixed quantity of bytes. There's simply no isomorphism possible between real numbers and its limited notation.
But most of the time you can simply ignore it. 21.4==21.4 would still be true because it is still the same numbers with the same error. But 21.4f==21.4 may not be true because the error for float and double are different.
If you need fixed precision, perhaps you should try fixed point numbers. Or even integers. I for example often use int(1000*x) for passing to debug pager.
Dangers of computer arithmetic
If it bothers you, you can customize the way some values are displayed during debug. Use it with care :-)
Enhancing Debugging with the Debugger Display Attributes
Refer to General Decimal Arithmetic
Also take note when comparing floats, see this answer for more information.
According to the javadoc
"If at least one of the operands to a numerical operator is of type double, then the
operation is carried out using 64-bit floating-point arithmetic, and the result of the
numerical operator is a value of type double. If the other operand is not a double, it is
first widened (§5.1.5) to type double by numeric promotion (§5.6)."
Here is the Source

Float vs Double

Is there ever a case where a comparison (equals()) between two floating point values would return false if you compare them as DOUBLE but return true if you compare them as FLOAT?
I'm writing some procedure, as part of my group project, to compare two numeric values of any given types. There're 4 types I'd have to deal with altogether : double, float, int and long. So I'd like to group double and float into one function, that is, I'd just cast any float to double and do the comparison.
Would this lead to any incorrect results?
Thanks.
If you're converting doubles to floats and the difference between them is beyond the precision of the float type, you can run into trouble.
For example, say you have the two double values:
9.876543210
9.876543211
and that the precision of a float was only six decimal digits. That would mean that both float values would be 9.87654, hence equal, even though the double values themselves are not equal.
However, if you're talking about floats being cast to doubles, then identical floats should give you identical doubles. If the floats are different, the extra precision will ensure the doubles are distinct as well.
As long as you are not mixing promoted floats and natively calculated doubles in your comparison you should be ok, but take care:
Comparing floats (or doubles) for equality is difficult - see this lengthy but excellent discussion.
Here are some highlights:
You can't use ==, because of problems with the limited precision of floating point formats
float(0.1) and double(0.1) are different values (0.100000001490116119384765625 and 0.1000000000000000055511151231257827021181583404541015625) respectively. In your case, this means that comparing two floats (by converting to double) will probably be ok, but be careful if you want to compare a float with a double.
It's common to use an epsilon or small value to make a relative comparison with (floats a and b are considered equal if a - b < epsilon). In C, float.h defines FLT_EPSILON for exactly this purpose. However, this type of comparison doesn't work where a and b are both very small, or both very large.
You can address this by using a scaled-relative-to-the-sizes-of-a-and-b epsilon, but this breaks down in some cases (like comparisons to zero).
You can compare the integer representations of the floating point numbers to find out how many representable floats there are between them. This is what Java's Float.equals() does. This is called the ULP difference, for "Units in Last Place" difference. It's generally good, but also breaks down when comparing against zero.
The article concludes:
Know what you’re doing
There is no silver bullet. You have to choose wisely.
If you are comparing against zero, then relative epsilons and ULPs based comparisons are usually meaningless. You’ll need to use an absolute epsilon, whose value might be some small multiple of FLT_EPSILON and the inputs to your calculation. Maybe.
If you are comparing against a non-zero number then relative epsilons or ULPs based comparisons are probably what you want. You’ll probably want some small multiple of FLT_EPSILON for your relative epsilon, or some small number of ULPs. An absolute epsilon could be used if you knew exactly what number you were comparing against.
If you are comparing two arbitrary numbers that could be zero or non-zero then you need the kitchen sink. Good luck and God speed.
So, to answer your question:
If you are downgrading doubles to floats, then you might lose precision, and incorrectly report two different doubles as equal (as paxdiablo points out.)
If you are upgrading identical floats to double, then the added precision won't be a problem unless you are comparing a float with a double (Say you'd got 1.234 in float, and you only had 4 decimal digits of accuracy, then the double 1.2345 MIGHT represent the same value as the float. In this case you'd probably be better to do the comparison at the precision of the float, or more generally, at the error level of the most inaccurate representation in the comparison).
If you know the number you'll be comparing with, you can follow the advice quoted above.
If you're comparing arbitrary numbers (which could be zero or non-zero), there's no way to compare them correctly in all cases - pick one comparison and know its limitations.
A couple of practical considerations (since this sounds like it's for an assignment):
The epsilon comparison mentioned by most is probably fine (but include a discussion of the limitations in the write up). If you're ever planning to compare doubles to floats, try to do it in float, but if not, try to do all comparisons in double. Even better, just use doubles everywhere.
If you want to totally ace the assignment, include a write-up of the issues when comparing floats and the rationale for why you chose any particular comparison method.
I don't understand why you're doing this at all. The == operator already caters for all possible types on both sides, with extensive rules on type coercion and widening which are already specified in the relevant language standards. All you have to do is use it.
I'm perhaps not answering the OP's question but rather responding to some more or less fuzzy advice which require clarifications.
Comparing two floating point values for equality is absolutely possible and can be done. If the type is single or double precision is often of less importance.
Having said that the steps leading up to the comparison itself require great care and a thorough understanding of floating-point dos and don'ts, whys and why nots.
Consider the following C statements:
result = a * b / c;
result = (a * b) / c;
result = a * (b / c);
In most naive floating-point programming they are seen as "equivalent" i e producing the "same" result. In the real world of floating-point they may be. Or actually, the first two are equivalent (as the second follows C evaluation rules, i e operators of same priority left to right). The third may or may not be equivalent to the first twp.
Why is this?
"a * b / c" or "b / c * a" may cause the "inexact" exception i e an intermediate or the final result (or both) is (are) not exact(ly representable in floating point format). If this is the case the results will be more or less subtly different. This may or may not lead to the end results being amenable to an equality comparison. Being aware of this and single-stepping through operations one at a time - noting intermediate results - will allow the patient programmer to "beat the system" i e construct a quality floating-point comparison for practically any situation.
For everyone else, passing over the equality comparison for floating-poiny numbers is good, solid advice.
It's really a bit ironic because most programmers know that integer math results in predictable truncations in various situations. When it comes to floating-point almost everyone is more or less thunderstruck that results are not exact. Go figure.
You should be okay to make that cast as long as the equality test involves a delta.
For example: abs((double) floatVal1 - (double) floatVal2) < .000001 should work.
Edit in response to the question change
No you would not. The above still stands.
For the comparison between float f and double d, you can calculate the difference of f and d. If abs(f-d) is less than some threshold, you can think of the equality holds. These threshold could be either absolute or relative as your application requirement. There are some good solutions Here. And I hope it helpful.
Would I ever get an incorrect result if I promote 2 floats to
double and do a 64bit comparison rather than a 32bit comparison?
No.
If you start with two floats, which could be float variables (float x = foo();) or float constants (1.234234234f) then you can compare them directly, of course. If you convert them to double and then compare them then the results will be identical.
This works because double is a super-set of float. That is, every value that can be stored in a float can be stored in a double. The range of the exponent and mantissa are both increased. There are billions of values that can be stored in a double but not in a float, but there are zero values that can be stored in a float but not a double.
As discussed in my float comparison article it can be tricky to do a meaningful comparison between float or double values, because rounding errors may have crept in. But, converting both numbers from float to double doesn't not change this. All of the mentions of epsilons (which are often but not always needed) are completely orthogonal to the question.
On the other hand, comparing a float to a double is madness. 1.1 (a double) is not equal to 1.1f (a float) because 1.1 cannot be exactly represented in either.

min value of float in java is positive why? [duplicate]

This question already has answers here:
Why is Double.MIN_VALUE in not negative
(6 answers)
Closed 6 years ago.
when we use MIN_VALUE function on either of primitive types in java it give us minimum value possible for that type.
BUT
in case of float and double it returned minimum positive value though float and double can have negative values also.
MIN_VALUE tells you how precise a float can get, but not the mathematical minimum it can represent. They should have named it better...
Negative of MAX_VALUE is the mathematical minimum value for floats(same goes for double).
The reason you can assume this has to do with how numbers are represented in binary:
Java float and doubles use sign bit to represent negative/positive values as opposed to two's complement representation for integers. This means it's positive and negative value have the same magnitude, ie. -(positive max) = negative max. Therefore you don't need to define an actual mathematical minimum constant for floats because you can just use the positive max constant to figure out the what the negative bound is. For integers, you need 2 different constants defining max/min because of the way they represented in binary, i.e. -max != min.
For more info http://people.uncw.edu/tompkinsj/133/numbers/Reals.htm
MIN_VALUE should be named EPSILON it's the smallest postive value a float can represent.
Because a float uses the sign-magnitude encoding, the lowest value a float can represent is -MAX_VALUE.
A possible explanation could be that Java just used the same naming convention as C++, which again inherited the names from C.
Java was influenced by C++, which shares the same confusing naming pattern. In C++, the analogy of Float.MIN_VALUE is std::numeric_limits<T>::min(), which is defined as:
Minimum finite value.
For floating types with denormalization (variable number of exponent bits): minimum positive normalized value.
In C++ that is a potential source of bugs in template code, so later in C++11, they added std::numeric_limits<T>::lowest(), which is defined as:
Minimum finite value. (since C++11)
For integral types: the same as min().
For floating-point types: implementation-dependent; generally, the negative of max().
But C++ was not the first language. It all goes back to C, which defines FLT_MIN the minimal floating point value.
So, why did C choose to define the minimums of floating point numbers and integers inconsistently?
Not sure, but it could have to do with symmetry (see this answer). For floats, you can use -FLT_MAX (or -Float.MAX_VALUE). For integers, negating the maximum value is not portable. In fact, it is generally wrong on all modern architectures (where -INT_MAX == INT_MIN + 1 should hold).
min value of float in java is positive why?
Why they chose to name the constants like that is unanswerable (by us) because nobody on SO was in the room when the decision was made.
Besides, knowing the answer to the question is not going to help, because the values of Float.MIN_VALUE and Double.MIN_VALUE won't be changed, no matter how "wrong" they might be. (It would break any existing code that uses these constants, and the Java designers only do that when there is no other viable alternative. Leaving it alone is clearly a viable alternative.)
I suppose, the answer (i.e. the real reason for the decision) might be relevant to people developing brand new programming languages. However, they are going to have to make up their own minds anyway. FWIW, I wouldn't have designed it this way, but that's not relevant.

Why does Math.round return a long but Math.floor return a double?

Why the inconsistency?
There is no inconsistency: the methods are simply designed to follow different specifications.
long round(double a)
Returns the closest long to the argument.
double floor(double a)
Returns the largest (closest to positive infinity) double value that is less than or equal to the argument and is equal to a mathematical integer.
Compare with double ceil(double a)
double rint(double a)
Returns the double value that is closest in value to the argument and is equal to a mathematical integer
So by design round rounds to a long and rint rounds to a double. This has always been the case since JDK 1.0.
Other methods were added in JDK 1.2 (e.g. toRadians, toDegrees); others were added in 1.5 (e.g. log10, ulp, signum, etc), and yet some more were added in 1.6 (e.g. copySign, getExponent, nextUp, etc) (look for the Since: metadata in the documentation); but round and rint have always had each other the way they are now since the beginning.
Arguably, perhaps instead of long round and double rint, it'd be more "consistent" to name them double round and long rlong, but this is argumentative. That said, if you insist on categorically calling this an "inconsistency", then the reason may be as unsatisfying as "because it's inevitable".
Here's a quote from Effective Java 2nd Edition, Item 40: Design method signatures carefully:
When in doubt, look to the Java library APIs for guidance. While there are plenty of inconsistencies -- inevitable, given the size and scope of these libraries -- there are also fair amount of consensus.
Distantly related questions
Why does int num = Integer.getInteger("123") throw NullPointerException?
Most awkward/misleading method in Java Base API ?
Most Astonishing Violation of the Principle of Least Astonishment
floor would have been chosen to match the standard c routine in math.h (rint, mentioned in another answer, is also present in that library, and returns a double, as in java).
but round was not a standard function in c at that time (it's not mentioned in C89 - c identifiers and standards; c99 does define round and it returns a double, as you would expect). it's normal for language designers to "borrow" ideas, so maybe it comes from some other language? fortran 77 doesn't have a function of that name and i am not sure what else would have been used back then as a reference. perhaps vb - that does have Round but, unfortunately for this theory, it returns a double (php too). interestingly, perl deliberately avoids defining round.
[update: hmmm. looks like smalltalk returns integers. i don't know enough about smalltalk to know if that is correct and/or general, and the method is called rounded, but it might be the source. smalltalk did influence java in some ways (although more conceptually than in details).]
if it's not smalltalk, then we're left with the hypothesis that someone simply chose poorly (given the implicit conversions possible in java it seems to me that returning a double would have been more useful, since then it can be used both while converting types and when doing floating point calculations).
in other words: functions common to java and c tend to be consistent with the c library standard at the time; the rest seem to be arbitrary, but this particular wrinkle may have come from smalltalk.
I agree, that it is odd that Math.round(double) returns long. If large double values are cast to long (which is what Math.round implicitly does), Long.MAX_VALUE is returned. An alternative is using Math.rint() in order to avoid that. However, Math.rint() has a somewhat strange rounding behavior: ties are settled by rounding to the even integer, i.e. 4.5 is rounded down to 4.0 but 5.5 is rounded up to 6.0). Another alternative is to use Math.floor(x+0.5). But be aware that 1.5 is rounded to 2 while -1.5 is rounded to -1, not -2. Yet another alternative is to use Math.round, but only if the number is in the range between Long.MIN_VALUE and Long.MAX_VALUE. Double precision floating point values outside this range are integers anyhow.
Unfortunately, why Math.round() returns long is unknown. Somebody made that decision, and he probably never gave an interview to tell us why. My guess is, that Math.round was designed to provide a better way (i.e., with rounding) for converting doubles to longs.
Like everyone else here I also don't know the answer, but thought someone might find this useful. I noticed that if you want to round a double to an int without casting, you can use the two round implementations long round(double) and int round(float) together:
double d = something;
int i = Math.round(Math.round(d));

Categories

Resources