Implement "tolerant" `equals` & `hashCode` for a class with floating point members - java

I have a class with a float field. For example:
public class MultipleFields {
final int count;
final float floatValue;
public MultipleFields(int count, float floatValue) {
this.count = count;
this.floatValue = floatValue;
}
}
I need to be able to compare instances by value. Now how do I properly implement equals & hashCode?
The usual way to implement equals and hashCode is to just consider all fields. E.g. Eclipse will generate the following equals:
public boolean equals(Object obj) {
// irrelevant type checks removed
....
MultipleFields other = (MultipleFields) obj;
if (count != other.count)
return false;
if (Float.floatToIntBits(floatValue) != Float.floatToIntBits(other.floatValue))
return false;
return true;
}
(and a similar hashCode, that essentially computes count* 31 + Float.floatToIntBits(floatValue)).
The problem with this is that my FP values are subject to rounding errors (they may come from user input, from a DB, etc.). So I need a "tolerant" comparison.
The common solution is to compare using an epsilon value (see e.g. Comparing IEEE floats and doubles for equality). However, I'm not quite certain how I can implement equals using this method, and still have a hashCode that is consisten with equals.
My idea is to define the number of significant digits for comparison, then always round to that number of digits in both equals and hashCode:
long comparisonFloatValue = Math.round(floatValue* (Math.pow(10, RELEVANT_DIGITS)));
Then if I replace all uses of floatValue with comparisonFloatValue in equals and hashCode, I should get a "tolerant" comparison, which is consistent with hashCode.
Will this work?
Do you see any problems with this approach?
Is there a better way to do this? It seems rather complicated.

The big problem with it is that two float values could still be very close together but still compare unequal. Basically you're dividing the range of floating point values into buckets - and two values could be very close together without being in the same bucket. Imagine you were using two significant digits, applying truncation to obtain the bucket, for example... then 11.999999 and 12.000001 would be unequal, but 12.000001 and 12.9999999 would be equal despite being much further apart from each other.
Unfortunately, if you don't bucket values like this, you can't implement equals appropriately because of transitivity: x and y may be close together, y and z may be close together, but that doesn't mean that x and z are close together.

Related

Recursion sum with generics [duplicate]

I have two Numbers. Eg:
Number a = 2;
Number b = 3;
//Following is an error:
Number c = a + b;
Why arithmetic operations are not supported on Numbers? Anyway how would I add these two numbers in java? (Of course I'm getting them from somewhere and I don't know if they are Integer or float etc).
You say you don't know if your numbers are integer or float... when you use the Number class, the compiler also doesn't know if your numbers are integers, floats or some other thing. As a result, the basic math operators like + and - don't work; the computer wouldn't know how to handle the values.
START EDIT
Based on the discussion, I thought an example might help. Computers store floating point numbers as two parts, a coefficient and an exponent. So, in a theoretical system, 001110 might be broken up as 0011 10, or 32 = 9. But positive integers store numbers as binary, so 001110 could also mean 2 + 4 + 8 = 14. When you use the class Number, you're telling the computer you don't know if the number is a float or an int or what, so it knows it has 001110 but it doesn't know if that means 9 or 14 or some other value.
END EDIT
What you can do is make a little assumption and convert to one of the types to do the math. So you could have
Number c = a.intValue() + b.intValue();
which you might as well turn into
Integer c = a.intValue() + b.intValue();
if you're willing to suffer some rounding error, or
Float c = a.floatValue() + b.floatValue();
if you suspect that you're not dealing with integers and are okay with possible minor precision issues. Or, if you'd rather take a small performance blow instead of that error,
BigDecimal c = new BigDecimal(a.floatValue()).add(new BigDecimal(b.floatValue()));
It would also work to make a method to handle the adding for you. Now I do not know the performance impact this will cause but I assume it will be less than using BigDecimal.
public static Number addNumbers(Number a, Number b) {
if(a instanceof Double || b instanceof Double) {
return a.doubleValue() + b.doubleValue();
} else if(a instanceof Float || b instanceof Float) {
return a.floatValue() + b.floatValue();
} else if(a instanceof Long || b instanceof Long) {
return a.longValue() + b.longValue();
} else {
return a.intValue() + b.intValue();
}
}
The only way to correctly add any two types of java.lang.Number is:
Number a = 2f; // Foat
Number b = 3d; // Double
Number c = new BigDecimal( a.toString() ).add( new BigDecimal( b.toString() ) );
This works even for two arguments with a different number-type. It will (should?) not produce any sideeffects like overflows or loosing precision, as far as the toString() of the number-type does not reduce precision.
java.lang.Number is just the superclass of all wrapper classes of primitive types (see java doc). Use the appropriate primitive type (double, int, etc.) for your purpose, or the respective wrapper class (Double, Integer, etc.).
Consider this:
Number a = 1.5; // Actually Java creates a double and boxes it into a Double object
Number b = 1; // Same here for int -> Integer boxed
// What should the result be? If Number would do implicit casts,
// it would behave different from what Java usually does.
Number c = a + b;
// Now that works, and you know at first glance what that code does.
// Nice explicit casts like you usually use in Java.
// The result is of course again a double that is boxed into a Double object
Number d = a.doubleValue() + (double)b.intValue();
Use the following:
Number c = a.intValue() + b.intValue(); // Number is an object and not a primitive data type.
Or:
int a = 2;
int b = 3;
int c = 2 + 3;
I think there are 2 sides to your question.
Why is operator+ not supported on Number?
Because the Java language spec. does not specify this, and there is no operator overloading. There is also not a compile-time natural way to cast the Number to some fundamental type, and there is no natural add to define for some type of operations.
Why are basic arithmic operations not supported on Number?
(Copied from my comment:)
Not all subclasses can implement this in a way you would expect. Especially with the Atomic types it's hard to define a usefull contract for e.g. add.
Also, a method add would be trouble if you try to add a Long to a Short.
If you know the Type of one number but not the other it is possible to do something like
public Double add(Double value, Number increment) {
return value + Double.parseDouble(increment.toString());
}
But it can be messy, so be aware of potential loss of accuracy and NumberFormatExceptions
Number is an abstract class which you cannot make an instance of. Provided you have a correct instance of it, you can get number.longValue() or number.intValue() and add them.
First of all, you should be aware that Number is an abstract class. What happens here is that when you create your 2 and 3, they are interpreted as primitives and a subtype is created (I think an Integer) in that case. Because an Integer is a subtype of Number, you can assign the newly created Integer into a Number reference.
However, a number is just an abstraction. It could be integer, it could be floating point, etc., so the semantics of math operations would be ambiguous.
Number does not provide the classic map operations for two reasons:
First, member methods in Java cannot be operators. It's not C++. At best, they could provide an add()
Second, figuring out what type of operation to do when you have two inputs (e.g., a division of a float by an int) is quite tricky.
So instead, it is your responsibility to make the conversion back to the specific primitive type you are interested in it and apply the mathematical operators.
The best answer would be to make util with double dispatch drilling down to most known types (take a look at Smalltalk addtition implementation)

Comparing double for equality is not safe. So what to do if want to compare it to 0?

I know that because of binary double representation, comparison for equality of two doubles is not quite safe. But I need to perform some computation like this:
double a;
//initializing
if(a != 0){ // <----------- HERE
double b = 2 / a;
//do other computation
}
throw new RuntimeException();
So, comparison of doubles is not safe, but I definitely do not want to to devide by 0. What to do in this case?
I'd use BigDecimal but its performance is not quite acceptable.
Well, if your issue is dividing by zero, the good news is that you can do what you have since if the value isn't actually 0, you can divide by it, even if it's really, really small.
You can use a range comparison, comparing it to the lowest value you want to allow, e.g. a >= 0.0000000000001 or similar (if it's always going to be positive).
What about using the Static method compare of the Double class wrapper??
double a = 0.0;
// initializing
if (Double.compare(a, 0.0) != 0) {
}

Generating hashCode() for a custom class

I have a class named Dish and I handle it inside ArrayLists
So I had to override default hashCode() method.
#Override
public int hashCode() {
int hash =7;
hash = 3*hash + this.getId();
hash = 3*hash + this.getQuantity();
return hash;
}
When I get two dishes with id=4,quan=3 and id=5,quan=0, hashCode() for both is same;
((7*3)+4)*3+3 = 78
((7*3)+5)*3+0 = 78
What am I doing wrong? Or the magic numbers 7 and 3 I have chosen is wrong?
How do I properly override hashCode() so that it generates unique hashes?
PS: From what I searched from google and SO, people use different numbers but the same method. If the problem is with the numbers, how do I wisely choose numbers that doesn't actually increase the cost for multiplication and at the same time works well even for more number of attributes.
Say I had 7 int attributes and my second magic no. is 31, the final hash will be first magic no. * 27512614111 even if all my attributes are 0. So how do I do it without having my hashed value in billions so as keep my processor burden-free?
You can use something like this
public int hashCode() {
int result = 17;
result = 31 * result + getId();
result = 31 * result + getQuantity();
return result;
}
One more thing if your id is unique for each object then no need of using quantity while calculating hashcode.
Here is extract from Effective Java by Joshua bloch telling how implement hashcode method
Store some constant nonzero value, say, 17, in an int variable called result.
For each significant field f in your object (each field taken into account by the equals method, that is), do the following:
a. Compute an int hash code c for the field:
i. If the field is a boolean, compute (f ? 1 : 0).
ii. If the field is a byte , char, short, or int, compute (int) f .
iii. If the field is a long , compute (int) (f ^ (f >>> 32)) .
iv. If the field is a float , compute Float.floatToIntBits(f) .
v. If the field is a double, compute Double.doubleToLongBits(f) , and then hash the resulting long as in step 2.a.iii.
vi. If the field is an object reference and this class’s equals method compares the field by recursively invoking equals, recursively invoke hashCode on the field. If a more complex comparison is required, compute a “canonical representation” for this field and invoke hashCode on the canonical representation. If the value of the field is null, return 0 (or some other constant, but 0 is traditional).
vii. If the field is an array, treat it as if each element were a separate field. That is, compute a hash code for each **significant element** by applying these rules recursively, and combine these values per step 2.b. If every element in an array field is significant, you can use one of the Arrays.hashCode methods added in release 1.5.
b. Combine the hash code c computed in step 2.a into result as follows:
result = 31 * result + c;
Return result .
When you are finished writing the hashCode method, ask yourself whether equal instances have equal hash codes. Write unit tests to verify your intuition!If equal instances have unequal hash codes, figure out why and fix the problem.
Source: Effective Java by Joshua Bloch
This is perfectly OK. The hashing function is not supposed to be universally unique - it just gives a quick hint about which elements might be equal and should be checked in more depth by a call to equals().
From the name of class looks like quantity is the number of dish. So, There is chance that many times it will be zero. I would say in case getquantity() is zero use a variable say x in the hash fucntion.like this:
#Override
public int hashCode() {
int hash =7;int x =0;
if(getQuantity==0)
{
x = getQuantity+getId();
}
else
{
x = getquantity;
}
hash = 3*hash + this.getId();
hash = 3*hash + x;
return hash;
}
I believe this should reduce the collision of hash.since the getId() you have is a unique number.it makes the x a unique number too.

Math.pow and Math.sqrt work differently for large values?

I'm using Heron's formula to find the area of a triangle. Given sides a, b, and c, A = √(s(s-a)(s-b)(s-c)) where s is the semiperimeter (a+b+c)/2. This formula should work perfectly, but I noticed that Math.pow() and Math.sqrt() give different results. Why does this happen and how can I fix it?
I wrote two methods that find the area and determine if it is an integer.
In this first method, I take the square roots and then multiply them:
public static boolean isAreaIntegral(long a, long b, long c)
{
double s = (a+b+c)/2.0;
double area = Math.sqrt(s)*Math.sqrt(s-a)*Math.sqrt(s-b)*Math.sqrt(s-c);
return area%1.0==0.0 && area > 0.0;
}
In this second method, I find the product and then take the square root:
public static boolean isAreaIntegral(long a, long b, long c)
{
double s = (a+b+c)/2.0;
double area = Math.pow(s*(s-a)*(s-b)*(s-c),0.5);
return area%1.0==0.0 && area > 0.0;
}
Can anyone explain why these two methods that are mathematically equivalent give different Values? I'm working on Project Euler Problem 94. My answer comes out to 999990060 the first way and 996784416 the second way. (I know that both answers are very far off the actual)
I would certainly vote for "rounding issues", as you multiply the results of multiple method call in the first method (where every method result gets rounded) compared to the single method call in the second method, where you round only once.
The difference between the answers is larger than I'd expect. Or maybe it isn't. It's late and my mathematical mind crashed a while ago.
I think your issue is with rounding. When you multiply a load of roots together, your answer falls further from the true value.
The second method will be more accurate.
Though, not necessarily as accurate as Euler is asking for.
A calculator is a good bet.
Both methods are problematic. You should in general be very careful when comparing floating point values (that is, also double precision floating point values). Particularly, comparing the result of a computation with == or != is nearly always questionable (and quite often it is just wrong). Comparing two floating point values for "equality" should be done with a method like
private static boolean isEqual(double x, double y)
{
double epsilon = 1e-8;
return Math.abs(x - y) <= epsilon * Math.abs(x);
// see Knuth section 4.2.2 pages 217-218
}
In this case, the floating-point remainder operator will also not have the desired result. Consider the following, classic example
public class PrecisionAgain
{
public static void main(String[] args)
{
double d = 0;
for (int i=0; i<20; i++)
{
d += 0.1;
}
System.out.println(d);
double r = d%1.0;
System.out.println(r);
}
}
Output:
2.0000000000000004
4.440892098500626E-16
In your case, in order to rule out these rounding errors, the return statement could probably (!) something simple like
return (area - Math.round(area) < 1e8);
But in other situations, you should definitely read more about floating point operations. (The site http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html is often recommended, but might be a tough one to start with...)
This still does not really answer your actual question: WHY are the results different? In doubt, the answer is this simple: Because they make different errors (but they both make errors - that's in fact more important here!)

Complex number equals method

I'm making a complex number class in Java like this:
public class Complex {
public final double real, imag;
public Complex(double real, double imag) {
this.real = real;
this.imag = imag;
}
... methods for arithmetic follow ...
}
I implemented the equals method like this:
#Override
public boolean equals(Object obj) {
if (obj instanceof Complex) {
Complex other = (Complex)obj;
return (
this.real == other.real &&
this.imag == other.imag
);
}
return false;
}
But if you override equals, you're supposed to override hashCode too. One of the rules is:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
Comparing floats and doubles with == does a numeric comparison, so +0.0 == -0.0 and NaN values are inequal to everything including themselves. So I tried implementing the hashCode method to match the equals method like this:
#Override
public int hashCode() {
long real = Double.doubleToLongBits(this.real); // harmonize NaN bit patterns
long imag = Double.doubleToLongBits(this.imag);
if (real == 1L << 63) real = 0; // convert -0.0 to +0.0
if (imag == 1L << 63) imag = 0;
long h = real ^ imag;
return (int)h ^ (int)(h >>> 32);
}
But then I realized that this would work strangely in a hash map if either field is NaN, because this.equals(this) will always be false, but maybe that's not incorrect. On the other hand, I could do what Double and Float do, where the equals methods compare +0.0 != -0.0, but still harmonize the different NaN bit patterns, and let NaN == NaN, so then I get:
#Override
public boolean equals(Object obj) {
if (obj instanceof Complex) {
Complex other = (Complex)obj;
return (
Double.doubleToLongBits(this.real) ==
Double.doubleToLongBits(other.real) &&
Double.doubleToLongBits(this.imag) ==
Double.doubleToLongBits(other.imag)
);
}
return false;
}
#Override
public int hashCode() {
long h = (
Double.doubleToLongBits(real) +
Double.doubleToLongBits(imag)
);
return (int)h ^ (int)(h >>> 32);
}
But if I do that then my complex numbers don't behave like real numbers, where +0.0 == -0.0. But I don't really need to put my Complex numbers in hash maps anyway -- I just want to do the right thing, follow best practices, etc. And now I'm just confused. Can anyone advise me on the best way to proceed?
I've thought about this some more. The problem stems from trying to balance two uses of equals: IEEE 754 arithmetic comparison and Object/hashtable comparison. For floating-point types, the two needs can never be satisfied at once due to NaN. The arithmetic comparison wants NaN != NaN, but the Object/hashtable comparison (equals method) requires this.equals(this).
Double implements the methods correctly according to the contract of Object, so NaN == NaN. It also does +0.0 != -0.0. Both behaviors are the opposite from comparisons on primitive float/double types.
java.util.Arrays.equals(double[], double[]) compares elements the same way as Double (NaN == NaN, +0.0 != -0.0).
java.awt.geom.Point2D does it technically wrong. Its equals method compares the coordinates with just ==, so this.equals(this) can be false. Meanwhile, its hashCode method uses doubleToLongBits, so its hashCode can be different for two objects even when equals returns true. The doc makes no mention of the subtleties, which implies the issue is not important: people don't put these types of tuples in hash tables! (And it wouldn't be very effective if they did, because you'd have to get exactly the same numbers to get an equal key.)
In a tuple of floating-points like a complex number class, the simplest correct implementation of equals and hashCode is to not override them at all.
If you want the methods to take the value in account, then the correct thing to do is what Double does: use doubleToLongBits (or floatToLongBits) in both methods. If that's not suitable for arithmetic, a separate method is needed; perhaps equals(Complex other, double epsilon) to compare numbers for equality within a tolerance.
Note that you can override equals(Complex other) without interfering with equals(Object other), but that seems too confusing.
The pathological case seems to be 0.0 != -0.0, so I'd make sure that never happens and do the rest of the it exactly the way Joshua Bloch tells you in "Effective Java".
Alternatively, the hashCode contract guarantees that hashCodes equal if the objects equal, but not that hashcodes are different if the objects are different. So you could just use this.real as the hash code, and accept the collisions. Unless there is a priori knowledge about the distribution of the numbers your library will actually encounter, it may not be possible to do better: you have 128 bits of values, and 32 bits of hash, so collisions are inevitable (and harmless, unless you can show that they pessimize your lookups for expected data sets).

Categories

Resources