How do sets differentiate between objects? - java

How does a set differentiate between objects in both Java and C++? Or do sets not differentiate them at all?
Take these for example:
C++
std::set<A> aset;
A a(1, 2); // Assume A has only two elements, and this constructor sets them both
aset.insert(a);
A a2(1, 2); // This would initialise a `A' object to the same values as `a', but a different object
aset.count(a2); // Would this return 1 or 0?
Java
set<A> aset;
A a = new A(1, 2); // Assume A has only two elements, and this constructor sets them both
aset.add(a);
A a2 = new A(1, 2); // This would initialise a `A' object to the same values as `a', but a different object
aset.contains(a2); // Would this return true or false?

In C++ the set depends on operator<() being defined for the class A, or that you supply a comparison object providing strict weak ordering to the set.

For Java it depends on the equals, hashcode contract.

For the Java part,
The method in charge of determining whether two objects are equal is:
public boolean equals(Object other)
Not to be confused with
public int hashCode()
Whose contract states that two equals objects must return the same number, but two objects that returned the same number may be, but are not necessarily, equal.
The default implementation for the equals method is equality by memory address, therefor if class A did not override the equals method the contains method will return false.
To have the set.contains(a2) method return true you must override the equals and hashCode method to comply as so:
public boolean equals(Object other) {
return other instanceof A && ((A) other).elem1 = this.elem1 && ((A) other).elem2 = this.elem2;
}
public int hashCode() {
return elem1 * 31 + elem2;
}
The hashCode is required (assuming you're using a HashSet) for the set to identify where in the internal representation of the set the object may be (i.e. where to look for it).
Search for HashSet\HashMap to understand the internal representation if you're interested.
As for the C++ part, If I remember correctly it depends on the correct operator overloading, but my C++ is rusty at best.
EDIT: I noticed you specifically asked about sets so I'll elaborate a bit more on how that:
While the equals method is what determines equality between two objects, some preliminary steps in the set implementation used (e.g. HashSet or TreeSet) might relay on something extra:
For example, the HashSet uses the hashCode() function to find the internal location the item might be in, so if A did not override/correctly implement the hashCode() function, the set.contains(a2) may return true or false (for default implementation it's non deterministic - depends on memory location and the current capacity of the set).
For a TreeSet internal implementation to correctly find items within it either the items contained must be implement the Comparable interface properly or the TreeSet itself must be supplied with a Comparator instance implemented properly.

for C++, according to set::insert in C++ Reference
Because set containers do not allow for duplicate values, the
insertion operation checks for each
element inserted whether another
element exists already in the
container with the same value, if so,
the element is not inserted and -if
the function returns a value- an
iterator to it is returned
.
They check for the values, unlike Java, which only checks for the address instead.

In Java at least, comparison is done on a hash code, which by default is created from the location of the object in memory. So in the Java part of the question, aset.contains(a2); would return false, as a2 points to a different part of the memory to a.
I'm afraid I can't comment on how C++ works!

Java calls the object's equals method, which, if you haven't overridden it, is the same as calling Object.hashCode().

Related

Compare two objects and add them in a Set [duplicate]

This question already has answers here:
Compare two objects with .equals() and == operator
(16 answers)
Closed 9 years ago.
I have two java objects that are instantiated from the same class.
MyClass myClass1 = new MyClass();
MyClass myClass2 = new MyClass();
If I set both of their properties to the exact same values and then verify that they are the same
if(myClass1 == myClass2){
// objects match
...
}
if(myClass1.equals(myClass2)){
// objects match
...
}
However, neither of these approaches return a true value. I have checked the properties of each and they match.
How do I compare these two objects to verify that they are identical?
You need to provide your own implementation of equals() in MyClass.
#Override
public boolean equals(Object other) {
if (!(other instanceof MyClass)) {
return false;
}
MyClass that = (MyClass) other;
// Custom equality check here.
return this.field1.equals(that.field1)
&& this.field2.equals(that.field2);
}
You should also override hashCode() if there's any chance of your objects being used in a hash table. A reasonable implementation would be to combine the hash codes of the object's fields with something like:
#Override
public int hashCode() {
int hashCode = 1;
hashCode = hashCode * 37 + this.field1.hashCode();
hashCode = hashCode * 37 + this.field2.hashCode();
return hashCode;
}
See this question for more details on implementing a hash function.
You need to Override equals and hashCode.
equals will compare the objects for equality according to the properties you need and hashCode is mandatory in order for your objects to be used correctly in Collections and Maps
You need to implement the equals() method in your MyClass.
The reason that == didn't work is this is checking that they refer to the same instance. Since you did new for each, each one is a different instance.
The reason that equals() didn't work is because you didn't implement it yourself yet. I believe it's default behavior is the same thing as ==.
Note that you should also implement hashcode() if you're going to implement equals() because a lot of java.util Collections expect that.
You have to correctly override method equals() from class Object
Edit: I think that my first response was misunderstood probably because I was not too precise. So I decided to to add more explanations.
Why do you have to override equals()? Well, because this is in the domain of a developer to decide what does it mean for two objects to be equal. Reference equality is not enough for most of the cases.
For example, imagine that you have a HashMap whose keys are of type Person. Each person has name and address. Now, you want to find detailed bean using the key. The problem is, that you usually are not able to create an instance with the same reference as the one in the map. What you do is to create another instance of class Person. Clearly, operator == will not work here and you have to use equals().
But now, we come to another problem. Let's imagine that your collection is very large and you want to execute a search. The naive implementation would compare your key object with every instance in a map using equals(). That, however, would be very expansive. And here comes the hashCode(). As others pointed out, hashcode is a single number that does not have to be unique. The important requirement is that whenever equals() gives true for two objects, hashCode() must return the same value for both of them. The inverse implication does not hold, which is a good thing, because hashcode separates our keys into kind of buckets. We have a small number of instances of class Person in a single bucket. When we execute a search, the algorithm can jump right away to a correct bucket and only now execute equals for each instance. The implementation for hashCode() therefore must distribute objects as evenly as possible across buckets.
There is one more point. Some collections require a proper implementation of a hashCode() method in classes that are used as keys not only for performance reasons. The examples are: HashSet and LinkedHashSet. If they don’t override hashCode(), the default Object
hashCode() method will allow multiple objects that you might consider "meaningfully
equal" to be added to your "no duplicates allowed" set.
Some of the collections that use hashCode()
HashSet
LinkedHashSet
HashMap
Have a look at those two classes from apache commons that will allow you to implement equals() and hashCode() easily
EqualsBuilder
HashCodeBuilder
1) == evaluates reference equality in this case
2) im not too sure about the equals, but why not simply overriding the compare method and plant it inside MyClass?

How to compare two java objects [duplicate]

This question already has answers here:
Compare two objects with .equals() and == operator
(16 answers)
Closed 9 years ago.
I have two java objects that are instantiated from the same class.
MyClass myClass1 = new MyClass();
MyClass myClass2 = new MyClass();
If I set both of their properties to the exact same values and then verify that they are the same
if(myClass1 == myClass2){
// objects match
...
}
if(myClass1.equals(myClass2)){
// objects match
...
}
However, neither of these approaches return a true value. I have checked the properties of each and they match.
How do I compare these two objects to verify that they are identical?
You need to provide your own implementation of equals() in MyClass.
#Override
public boolean equals(Object other) {
if (!(other instanceof MyClass)) {
return false;
}
MyClass that = (MyClass) other;
// Custom equality check here.
return this.field1.equals(that.field1)
&& this.field2.equals(that.field2);
}
You should also override hashCode() if there's any chance of your objects being used in a hash table. A reasonable implementation would be to combine the hash codes of the object's fields with something like:
#Override
public int hashCode() {
int hashCode = 1;
hashCode = hashCode * 37 + this.field1.hashCode();
hashCode = hashCode * 37 + this.field2.hashCode();
return hashCode;
}
See this question for more details on implementing a hash function.
You need to Override equals and hashCode.
equals will compare the objects for equality according to the properties you need and hashCode is mandatory in order for your objects to be used correctly in Collections and Maps
You need to implement the equals() method in your MyClass.
The reason that == didn't work is this is checking that they refer to the same instance. Since you did new for each, each one is a different instance.
The reason that equals() didn't work is because you didn't implement it yourself yet. I believe it's default behavior is the same thing as ==.
Note that you should also implement hashcode() if you're going to implement equals() because a lot of java.util Collections expect that.
You have to correctly override method equals() from class Object
Edit: I think that my first response was misunderstood probably because I was not too precise. So I decided to to add more explanations.
Why do you have to override equals()? Well, because this is in the domain of a developer to decide what does it mean for two objects to be equal. Reference equality is not enough for most of the cases.
For example, imagine that you have a HashMap whose keys are of type Person. Each person has name and address. Now, you want to find detailed bean using the key. The problem is, that you usually are not able to create an instance with the same reference as the one in the map. What you do is to create another instance of class Person. Clearly, operator == will not work here and you have to use equals().
But now, we come to another problem. Let's imagine that your collection is very large and you want to execute a search. The naive implementation would compare your key object with every instance in a map using equals(). That, however, would be very expansive. And here comes the hashCode(). As others pointed out, hashcode is a single number that does not have to be unique. The important requirement is that whenever equals() gives true for two objects, hashCode() must return the same value for both of them. The inverse implication does not hold, which is a good thing, because hashcode separates our keys into kind of buckets. We have a small number of instances of class Person in a single bucket. When we execute a search, the algorithm can jump right away to a correct bucket and only now execute equals for each instance. The implementation for hashCode() therefore must distribute objects as evenly as possible across buckets.
There is one more point. Some collections require a proper implementation of a hashCode() method in classes that are used as keys not only for performance reasons. The examples are: HashSet and LinkedHashSet. If they don’t override hashCode(), the default Object
hashCode() method will allow multiple objects that you might consider "meaningfully
equal" to be added to your "no duplicates allowed" set.
Some of the collections that use hashCode()
HashSet
LinkedHashSet
HashMap
Have a look at those two classes from apache commons that will allow you to implement equals() and hashCode() easily
EqualsBuilder
HashCodeBuilder
1) == evaluates reference equality in this case
2) im not too sure about the equals, but why not simply overriding the compare method and plant it inside MyClass?

Correct implementation of hashcode/equals

I have a class
class Pair<T>{
private T data;
private T alternative;
}
Two pair objects would be equal if
this.data.equals(that.data) && this.alternative.equals(that.alternative) ||
this.data.equals(that.alternative) && this.alternative.equals(that.data)
I'm having difficulty correctly implementing the hashCode() part though. Any suggestions would be appreciated
You should use the hashCode from data and alternative like this :
return this.data.hashCode() + this.alterative.hashCode();
Although it is not the best approach, as if you change the data or alternative, then their hashcode will also change. Think a little bit and see if you really need to use this class as a key in a map and if not a Long or String would be a better candidate.
This should do the trick:
#Override
public int hashCode() {
return data.hashCode() * alternative.hashCode();
}
Since you want to include both fields into the equals, you need to include both fields into the hashCode method. It is correct if unequal objects end up having the same hash code, but equal objects according to your scheme will always end up having the same hash code with this method.
Refer to the java doc, the general contract of hashCode is(copied from java doc):
- Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must
consistently return the same integer, provided no information used in
equals comparisons on the object is modified. This integer need not
remain consistent from one execution of an application to another
execution of the same application.
- If two objects are equal according to the equals(Object) method, then
calling the hashCode method on each of the two objects must produce
the same integer result.
- It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hashtables.
So from your implementation of equals, data and alternative are switchable. So you need make sure in your hashCode implementation returns the same value if you switch the position of data.hashCode() and alternative.hashCode(). If you are not sure, just return a const value such as 1 (But it may cause performance issue when you try to put the object into a Map).

Contains and equals

I'm a little befuddled by some code:
for (AbstractItem item : mSetOfItems) {
if (item.equals(pPrimaryItem))
{
System.out.println("Contains? " + mSetOfItems.contains(pPrimaryItem));
}
}
How could it be possible that item.equals(pPrimaryItem) resolves as true, and mSetOfItems.contains(pPrimaryItem) resolves as false? Because that's what I'm seeing in my code.
In other words, if I iterate through my set, I can find an element equal to my test element. But if I use contains, my test elements is not reported being in the set. I'm baffled because I thought contains used equals. What could I be overlooking?
You didn't give the type of mSetOfItems, but I'm guessing that AbstractItem overrides .equals() but not .hashcode(). This is bad.
If mSetOfItems uses hashcode for lookup, which it could based on its type, you'll get the behavior you described.
Your assumption is that .contains() is implemented with iteration and .equals(). There's no list interface which guarantees that.
What is the implementation of mSetOfItems?
If it's a tree, it could be that your comparison function returns inconsistent values.
If it's a hash, it could be that your equals() returns true for objects with different hash codes, or that the object's hashCode() has changed since it was inserted into the set.
If your set is a TreeSet or some other set where you're using a custom comparator, then you could see this if the comparator was broken, either by not returning a valid sorted order or by having objects that are actually equal compare unequal. When the set internally looks up an element and uses the comparator, it would make a wrong choice and not see the element.
If your set is a HashSet, your hash function could be broken and cause two objects that are equal to have different hash code. Internally as the HashSet uses the object's hash code to figure out where to look, it might end up looking in the wrong bucket.
Alternatively, if you store objects in a Set of any sort and then modify them, you might end up breaking some internal invariant of the Set. For instance, if you store something in a HashSet and then change its value, it will be in the wrong bucket, and if you have a TreeSet and change the value it may appear in the wrong spot in sorted order.
If you are concurrently modifying the set, it's possible that you might have added the element in another thread but not had any guarantees that the operation that made that change be visible in another thread. The second thread would then not see the element even if it were added.
Check the hashcode() method of your class
If mSetOfItems is a java.util.HashTable (or similar 'Hash' Collection, Set, etc) then you must implement hashCode() as well. boolean contains(Object elem) will first try to find the passed object by calculating its hash and retrieving it in the Collection. Once contains finds something, it will then use the equals method to verify that the two objects are the same objects according to your implementation.
If not properly overridden, hashCode() will return an unpredictable int that is usually the integer representation of the internal address of the object itself. This will always be different for two distinct objects no matter the the values of their instance variables. If not overridden, contains won't be able to find it any objects...
When implementing hashCode() remind that:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Also, make sure that you properly overridden the equals function by respecting its signature:
public boolean equals(Object obj);

How to compute the hashCode() from the object's address?

In Java, I have a subclass Vertex of the Java3D class Point3f. Now Point3f computes equals() based on the values of its coordinates, but for my Vertex class I want to be stricter: two vertices are only equal if they are the same object. So far, so good:
class Vertex extends Point3f {
// ...
public boolean equals(Object other) {
return this == other;
}
}
I know this violates the contract of equals(), but since I'll only compare vertices to other vertices this is not a problem.
Now, to be able to put vertices into a HashMap, the hashCode() method must return results consistent with equals(). It currently does that, but probably bases its return value on the fields of the Point3f, and therefore will give hash collisions for different Vertex objects with the same coordinates.
Therefore I would like to base the hashCode() on the object's address, instead of computing it from the Vertex's fields. I know that the Object class does this, but I cannot call its hashCode() method because Point3f overrides it.
So, actually my question is twofold:
Should I even want such a shallow equals()?
If yes, then, how do I get the object's address to compute the hash code from?
Edit: I just thought of something... I could generate a random int value on object creation, and use that for the hash code. Is that a good idea? Why (not)?
Either use System.identityHashCode() or use an IdentityHashMap.
System.identityHashCode() returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode().
You use a delegate even though this answer is probably better.
class Vertex extends Point3f{
private final Object equalsDelegate = new Object();
public boolean equals(Object vertex){
if(vertex instanceof Vertex){
return this.equalsDelegate.equals(((Vertex)vertex).equalsDelegate);
}
else{
return super.equals(vertex);
}
}
public int hashCode(){
return this.equalsDelegate.hashCode();
}
}
Just FYI, your equals method does NOT violate the equals contract (for the base Object's contract that is)... that is basically the equals method for the base Object method, so if you want identity equals instead of the Vertex equals, that is fine.
As for the hash code, you really don't need to change it, though the accepted answer is a good option and will be a lot more efficient if your hash table contains a lot of vertex keys that have the same values.
The reason you don't need to change it is because it is completely fine that the hash code will return the same value for objects that equals returns false... it is even a valid hash code to just return 0 all the time for EVERY instance. Whether this is efficient for hash tables is completely different issue... you will get a lot more collisions if a lot of your objects have the same hash code (which may be the case if you left hash code alone and had a lot of vertices with the same values).
Please don't accept this as the answer though of course (what you chose is much more practical), I just wanted to give you a little more background info about hash codes and equals ;-)
Why do you want to override hashCode() in the first place? You'd want to do it if you want to work with some other definition of equality. For example
public class A {
int id;
public boolean equals(A other) { return other.id==id}
public int hashCode() {return id;}
}
where you want to be clear that if the id's are the same then the objects are the same, and you override hashcode so that you can't do this:
HashSet hash= new HashSet();
hash.add(new A(1));
hash.add(new A(1));
and get 2 identical(from the point of view of your definition of equality) A's.
The correct behavior would then be that you'd only have 1 object in the hash, the second write would overwrite.
Since you are not using equals as a logical comparison, but a physical one (i.e. it is the same object), the only way you will guarantee that the hashcode will return a unique value, is to implement a variation of your own suggestion. Instead of generating a random number, use UUID to generate an actual unique value for each object.
The System.identityHashCode() will work, most of the time, but is not guaranteed as the Object.hashCode() method is not guaranteed to return a unique value for every object. I have seen the marginal case happen, and it will probably be dependent on the VM implementation, which is not something you will want your code be dependent on.
Excerpt from the javadocs for Object.hashCode():
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
The problem this addresses, is the case of having two separate point objects from overwriting each other when inserted into the hashmap because they both have the same hash. Since there is no logical equals, with the accompanying override of hashCode(), the identityHashCode method can actually cause this scenario to occur. Where the logical case would only replace hash entries for the same logical point, using the system based hash can cause it to occur with any two objects, equality (and even class) is no longer a factor.
The function hashCode() is inherited from Object and works exactly as you intend (on object level, not coordinate-level). There should be no need to change it.
As for your equals-method, there is no reason to even use it, since you can just do obj1 == obj2 in your code instead of using equals, since it's meant for sorting and similar, where comparing coordinates makes a lot more sense.

Categories

Resources