Java - Best way to compare two Sets by hashes (via ==) - java

I have two Sets of Strings. Strings are the same (== returns true). I believe that comparing via == should be faster than comparing via equals() or hashCode(). So how to make sure they are the same not using equals()? Thanks

Since the two Sets are not the same instance, you can't use == to compare the Sets.
Now, if you use Set's equals, it (and by "it" I mean the default implementation of AbstractSet) will validate the two Sets have the same size and then iterate over the elements of one Set and check if the other Set contains each of them.
If, for example, you are using HashSets, in order to find if a String is contained in some HashSet, hashCode() will have to be used to find the bucket that may contain the searched String, but later String's equals() will be called to validate the presence of a String equal to the searched String. String's equals() actually begins with
if (this == anObject) {
return true;
}
so the fact that your two Sets may contain references to the same String instance will be helpful in improving the runtime of the Sets comparison.

Related

Why does Java need equals() if there is hashCode()?

If two objects return same hashCode, doesn't it mean that they are equal? Or we need equals to prevent collisions?
And can I implement equals by comparing hashCodes?
If two objects have the same hashCode then they are NOT necessarily equal. Otherwise you will have discovered the perfect hash function. But the opposite is true - if the objects are equal, then they must have the same hashCode.
hashCode and Equals are different information about objects
Consider the analogy to Persons where hashcode is the Birthday,
in that escenario, you and many other people have the same b-day (same hashcode), all you are not the same person however..
Why does Java need equals() if there is hashCode()?
Java needs equals() because it is the method through which object equality is tested by examining classes, fields, and other conditions the designer considers to be part of an equality test.
The purpose of hashCode() is to provide a hash value primarily for use by hash tables; though it can also be used for other purposes. The value returned is based on an object's fields and hash codes of its composite and/or aggregate objects. The method does not take into account the class or type of object.
The relationship between equals() and hashCode() is an implication.
Two objects that are equal implies that the have the same hash code.
Two objects having the same hash code does not imply that they are equal.
The latter does not hold for several reasons:
There is a chance that two distinct objects may return the same hash code. Keep in mind that a hash value folds information from a large amount of data into a smaller number.
Two objects from different classes with similar fields will most likely use the same type of hash function, and return equal hash values; yet, they are not the same.
hashCode() can be implementation-specific returning different values on different JVMs or JVM target installations.
Within the same JVM, hashCode() can be used as a cheap precursor for equality by testing for a known hash code first and only if the same testing actual equality; provided that the equality test is significantly more expensive than generating a hash code.
And can I implement equals by comparing hashCodes?
No. As mentioned, equal hash codes does not imply equal objects.
The hashCode method as stated in the Oracle Docs is a numeric representation of an object in Java. This hash code has limited possible values (represented by the values which can be stored in an int).
For a more complex class, there is a high possibility that you will find two different objects which have the same hash code value. Also, no one stops you from doing this inside any class.
class Test {
#Override
public int hashCode() {
return 0;
}
}
So, it is not recommended to implement the equals method by comparing hash codes. You should use them for comparison only if you can guarantee that each object has an unique hash code. In most cases, your only certainty is that if two objects are equal using o1.equals(o2) then o1.hashCode() == o2.hashCode().
In the equals method you can define a more complex logic for comparing two objects of the same class.
If two objects return same hashCode, doesn't it mean that they are equal?
No it doesn't mean that.
The javadocs for Object state this:
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. ...
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must
produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCodemethod on
each of the two objects must produce distinct integer results. ...
Note the highlighted statement. It plainly says "No" to your question.
There is another way to look at this.
The hashCode returns an int.
There are only 232 distinct values that an int can take.
If a.hashCode() == b.hashCode() implies a.equals(b), then there can be only 232 distinct (i.e. mutually unequal) objects at any given time in a running Java application.
That last point is plainly not true. Indeed, it is demonstrably not true if you have a large enough heap to hold 232 instances of java.lang.Object ... in a 64-bit JVM.
And a third way is to some well-known examples where two different two character strings have the same hashcode.
Given that your assumption is incorrect, the reasoning that follows from it is also incorrect.
Java does need an equals method.
You generally cannot implement equals using just hashCode.
You may be able to use hashCode to implement a faster equals method, but only if calling hashCode twice is faster than comparing two objects. It generally isn't.
hashCodes are equal -> Objects might be equal -> further comparision is required
hashCodes are different -> Object are not equal (if hashCode is implemented right)
That's how equals method are implemented. At first you check if hashCodes are equal. If yes, you need to check class fields to see if it represents the exact same object. If hashCodes are different, you can be sure that objects are not equal.
Sometimes (very often?) you don't!
These answers are not untrue. But they don't tell the whole story.
One example would be where you are creating a load of objects of class SomeClass, and each instance that is created is given a unique ID by incrementing a static variable, nInstanceCount, or some such, in the constructor:
iD = nInstanceCount++;
Your hash function could then be
int hashCode(){
return iD;
}
and your equals could then be
boolean equals( Object obj ){
if( ! ( obj instanceof SomeClass )){
return false;
}
return hashCode() == obj.hashCode();
}
... under such circumstances your idea that "equals is superfluous" is effectively true: if all classes behaved like this, Java 10 (or Java 23) might say, ah, let's just get rid of silly old equals, what's the point? (NB backwards compatibility would then go out the window).
There are two essential points:
you couldn't then create more than MAXINT instances of SomeClass. Or... you could ... if you set up a system for reassigning the IDs of previously destroyed instances. IDs are typically long rather than int ... but this wouldn't work because hashCode() returns int.
none of these objects could then be "equal" to another one, since equality = identity for this particular class, as you have defined it. Often this is desirable. Often it shuts off whole avenues of possibilities...
The necessary implication of your question is, perhaps, what's the use of these two methods which, in a rather annoying way, have to "cooperate"? Frelling, in his/her answer, alluded to the crucial point: hash codes are needed for sorting into "buckets" with classes like HashMap. It's well worth reading up on this: the amount of advanced maths that has gone into designing efficient "bucket" mechanisms for classes like HashMap is quite frightening. After reading up on it you may come to have (like me) a bit of understanding and reverence about how and why you should bother implementing hashCode() with a bit of thought!

Can I implement hashCode and equals this way for (Hash)Sets?

I know that if a.equals(b), we must have a.hashCode() == b.hashCode() else we get strange results, but I'm wondering if the converse is also required.
More specifically, I have a hashCode() function that uses the field id to calculate the hashCode. However, my equals() function only uses the simple comparison "==" to check for equality. This may seem strange but unless more details are required, it's simply how I've implemented it.
Now the question is will this mess anything up? Specifically for HashSets, but more generally, for any (common) implementations of Set.
The way I understand it, a Set will first check the hashCode then the equals operator to see if a duplicate object exists. In this case, it should work right? If two objects are the same instance, they will produce the same hashCode as well as return true for equals() and thus the Set will only allow the instance to be be added once.
For two separate instances with the same id, the hashCode will be identical but then the equals() operator will return false and thus allow both objects to enter the Set, which is what I hope to accomplish.
Is this a beginner's mistake? Am I missing something? Will this have an unexpected results for any collection types other than Set?
edit:
I guess I should explain myself. I have a Hibernate object FooReference which implements both a hashCode and equals method using the id. This object is guaranteed to always have a unique id. However, before this object is persisted, I use a Foo object which has a default id of -1. So when putting it in a Set (to be saved) I know each Foo is unique (thus the basic == operator). So this Foo which extends FooReference overrides the equals method with a basic ==. For my purposes this should work... hopefully.
Objects are allowed to have the same hashcode without being equal to each other. In fact, it's perfectly valid (though inefficient and a bad idea) to implement hashCode as simply return 0, giving every instance the same hashcode.
All that's required is that if two objects are equal (as determined by the equals method), they have the same hashcode.
However, if your equals method just compares the two objects using == internally, no two (distinct) instances will ever be equal to each other, so there's no point defining your own hashCode and equals methods at all. The default implementations will produce the same behavior.

General contract for object comparision : equals() and hashCode()

There is a point in general contract of equals method, which says if You has defined equals() method then You should also define hashCode() method. And if o1.equals(o2) then this is must o1.hashCode() == o2.hashCode().
So my question is what if I break this contract? Where can bring fails the situation when o1.equals(o2) but o1.hashCode != o2.hashCode() ?
It will lead to unexpected behavior in hash based data structure for example: HashMap, Read how HashTable works
HashMap/HashTable/HashSet/etc will put your object into one of several buckets based on its hashCode, and then check to see if any other objects already in that bucket are equal.
Because these classes assume the equals/hashCode contract, they won't check for equality against objects in other buckets. After all, any object in another bucket must have a different hashCode, and thus (by the contract) cannot be equal to the object in quesiton. If two objects are equal but have different hash codes, they could end up in different buckets, in which case the HashMap/Table/Set/etc won't have a chance to compare them.
So, you could end up with a Set that contains two objects which are equal -- which it's not supposed to do; or a Map that contains two values for the same one key (since the buckets are by key); or a Map where you can't look up a key (since the lookup checks both the hash code and equality); or any number of similar bugs.
If you break the contract, your objects won't work with hash-based containers (and anything else that uses hashCode() and relies on its contract).
The basic intuition is as follows: to see whether two objects are the same, the container could call hashCode() on both, and compare the results. If the hash codes are different, the container is allowed to short-circuit by assuming that the two objects are not equal.
To give a specific example, if o1.equals(o2) but o1.hashCode() != o2.hashCode(), you'll likely be able to insert both objects into a HashMap (which is meant to store unique objects).

java: need identity and equality at the same time

I have a java project where I need both identity comparison (are the 2 references the same) and equality comparison (do both object contain the same data).
My solution is to not override equals/hashcode, and to add an isEqual method to my objects.
Is there a better pattern to deal with this situation?
EDIT:
Here is more information about this particular need.
By default we have:
equals performing an identity check (==)
contains being implemented in terms of equals, therefore using ==
But for my usage I would want:
equals to perform an equality check (objects contain the same data)
contains to remain implemented with ==
I can't have both at the same time, so one solution is to implement my
own equality check and have:
contains staying the same, using ==
implementing isEqual and use it instead of equals
another solution is to implement my own contains that uses ==:
implementing customContains to use == and use it instead of contains
Overriding equals to check whether objects contain the same data
Which is best? Is there another better way to do this?
Override the equals() method to determine if the objects contain the same data.
Use == to determine if they are the same object ie the same reference.
The best pattern to do this is to follow the specification of the language. Override equals and hashcode, and do not roll your own equality UNLESS and this is a big unless, no one outside of you will ever use this code AND it will never ever ever change.
If you want to wrap them in a function called isEqual, the hashcode and equals methods that is, this is another approach, but it still means you are overriding equals and hashcode, which you claim you do not want to do.
Essentially what you are doing is creating a very rigid and/or broken API, that will not see much usage as it will be difficult to utilize as your isEqual function could be quite non-deterministic.

Contains and equals

I'm a little befuddled by some code:
for (AbstractItem item : mSetOfItems) {
if (item.equals(pPrimaryItem))
{
System.out.println("Contains? " + mSetOfItems.contains(pPrimaryItem));
}
}
How could it be possible that item.equals(pPrimaryItem) resolves as true, and mSetOfItems.contains(pPrimaryItem) resolves as false? Because that's what I'm seeing in my code.
In other words, if I iterate through my set, I can find an element equal to my test element. But if I use contains, my test elements is not reported being in the set. I'm baffled because I thought contains used equals. What could I be overlooking?
You didn't give the type of mSetOfItems, but I'm guessing that AbstractItem overrides .equals() but not .hashcode(). This is bad.
If mSetOfItems uses hashcode for lookup, which it could based on its type, you'll get the behavior you described.
Your assumption is that .contains() is implemented with iteration and .equals(). There's no list interface which guarantees that.
What is the implementation of mSetOfItems?
If it's a tree, it could be that your comparison function returns inconsistent values.
If it's a hash, it could be that your equals() returns true for objects with different hash codes, or that the object's hashCode() has changed since it was inserted into the set.
If your set is a TreeSet or some other set where you're using a custom comparator, then you could see this if the comparator was broken, either by not returning a valid sorted order or by having objects that are actually equal compare unequal. When the set internally looks up an element and uses the comparator, it would make a wrong choice and not see the element.
If your set is a HashSet, your hash function could be broken and cause two objects that are equal to have different hash code. Internally as the HashSet uses the object's hash code to figure out where to look, it might end up looking in the wrong bucket.
Alternatively, if you store objects in a Set of any sort and then modify them, you might end up breaking some internal invariant of the Set. For instance, if you store something in a HashSet and then change its value, it will be in the wrong bucket, and if you have a TreeSet and change the value it may appear in the wrong spot in sorted order.
If you are concurrently modifying the set, it's possible that you might have added the element in another thread but not had any guarantees that the operation that made that change be visible in another thread. The second thread would then not see the element even if it were added.
Check the hashcode() method of your class
If mSetOfItems is a java.util.HashTable (or similar 'Hash' Collection, Set, etc) then you must implement hashCode() as well. boolean contains(Object elem) will first try to find the passed object by calculating its hash and retrieving it in the Collection. Once contains finds something, it will then use the equals method to verify that the two objects are the same objects according to your implementation.
If not properly overridden, hashCode() will return an unpredictable int that is usually the integer representation of the internal address of the object itself. This will always be different for two distinct objects no matter the the values of their instance variables. If not overridden, contains won't be able to find it any objects...
When implementing hashCode() remind that:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Also, make sure that you properly overridden the equals function by respecting its signature:
public boolean equals(Object obj);

Categories

Resources