Java overriding equals() and hashcode() for two interchangeable integers - java

I'm overriding the equals and hashcode methods for a simple container object for two ints. Each int reflects the index of another object (it doesn't matter what that object is). The point of the class is to represent a connection between the two objects.
The direction of the connection doesn't matter, therefore the equals method should return true regardless of which way round the two ints are in the object E.g.
connectionA = new Connection(1,2);
connectionB = new Connection(1,3);
connectionC = new Connection(2,1);
connectionA.equals(connectionB); // returns false
connectionA.equals(connectionC); // returns true
Here is what I have (modified from the source code for Integer):
public class Connection {
// Simple container for two numbers which are connected.
// Two Connection objects are equal regardless of the order of from and to.
int from;
int to;
public Connection(int from, int to) {
this.from = from;
this.to = to;
}
// Modifed from Integer source code
#Override
public boolean equals(Object obj) {
if (obj instanceof Connection) {
Connection connectionObj = (Connection) obj;
return ((from == connectionObj.from && to == connectionObj.to) || (from == connectionObj.to && to == connectionObj.from));
}
return false;
}
#Override
public int hashCode() {
return from*to;
}
}
This does work however my question is: Is there a better way to achieve this?
My main worry is with the hashcode() method will return the same hashcode for any two integers which multiply to equal the same number. E.g.
3*4 = 12
2*6 = 12 // same!
The documentation, http://docs.oracle.com/javase/1.5.0/docs/api/java/lang/Object.html#hashCode(), states that
It is not required that if two objects are unequal according to the
equals(java.lang.Object) method, then calling the hashCode method on
each of the two objects must produce distinct integer results.
However, the programmer should be aware that producing distinct
integer results for unequal objects may improve the performance of
hashtables.
If anyone can see a simple way of reducing the number of matching hashcodes then I would be appreciative of an answer.
Thanks!
Tim
PS I'm aware that there is a java.sql.Connection which could cause some import annoyances. The object actually has a more specific name in my application but for brevity I shortened it to Connection here.

Three solutions that would "work" have been proposed. (By work, I mean that they satisfy the basic requirement of a hashcode ... that different inputs give different outputs ... and they also satisfy the OP's additional "symmetry" requirement.)
These are:
# 1
return from ^ to;
# 2
return to*to+from*from;
# 3
int res = 17;
res = res * 31 + Math.min(from, to);
res = res * 31 + Math.max(from, to);
return res;
The first one has the problem that the range of the output is bounded by the range of the actual input values. So for instance if we assume that the inputs are both non-negative numbers less or equal to 2i and 2j respectively, then the output will be less or equal to 2max(i,j). That is likely to give you poor "dispersion"1 in your hash table ... and a higher rate of collisions. (There is also a problem when from == to!)
The second and third ones are better than the first, but you are still liable to get more collisions than is desirable if from and to are small.
I would suggest a 4th alternative if it is critical that you minimize collisions for small values of from and to.
#4
int res = Math.max(from, to);
res = (res << 16) | (res >>> 16); // exchange top and bottom 16 bits.
res = res ^ Math.min(from, to);
return res;
This has the advantage that if from and to are both in the range 0..216-1, you get a unique hashcode for each distinct (unordered) pair.
1 - I don't know if this is the correct technical term for this ...

This is widely accepted approach:
#Override
public int hashCode() {
int res = 17;
res = res * 31 + Math.min(from, to);
res = res * 31 + Math.max(from, to);
return res;
}

i think, something like
#Override
public int hashCode() {
return to*to+from*from;
}
is good enough

Typically I use XOR for hashcode method.
#Override
public int hashCode() {
return from ^ to;
}

I wonder why nobody offered the usually best solution: Normalize your data:
Connection(int from, int to) {
this.from = Math.min(from, to);
this.to = Math.max(from, to);
}
If it's impossible, then I'd suggest something like
27644437 * (from+to) + Math.min(from, to)
By a using a multiplier different from 31, you avoid collisions like in this question.
By using a big multiplier you spread the numbers better.
By using an odd multiplier you ensure that the multiplication is bijective (i.e., no information gets lost).
By using a prime you gain nothing at all, but everyone does it and it has no disadvantage.

Java 1.7+ have Objects.hash
#Override
public int hashCode() {
return Objects.hash(from, to);
}

Related

Comparing two large lists in java

I have to Array lists with 1000 objects in each of them. I need to remove all elements in Array list 1 which are there in Array list 2. Currently I am running 2 loops which is resulting in 1000 x 1000 operations in worst case.
List<DataClass> dbRows = object1.get("dbData");
List<DataClass> modifiedData = object1.get("dbData");
List<DataClass> dbRowsForLog = object2.get("dbData");
for (DataClass newDbRows : dbRows) {
boolean found=false;
for (DataClass oldDbRows : dbRowsForLog) {
if (newDbRows.equals(oldDbRows)) {
found=true;
modifiedData.remove(oldDbRows);
break;
}
}
}
public class DataClass{
private int categoryPosition;
private int subCategoryPosition;
private Timestamp lastUpdateTime;
private String lastModifiedUser;
// + so many other variables
public boolean equals(Object o) {
if (this == o) {
return true;
}
if (o == null || getClass() != o.getClass()) {
return false;
}
DataClass dataClassRow = (DataClass) o;
return categoryPosition == dataClassRow.categoryPosition
&& subCategoryPosition == dataClassRow.subCategoryPosition && (lastUpdateTime.compareTo(dataClassRow.lastUpdateTime)==0?true:false)
&& stringComparator(lastModifiedUser,dataClassRow.lastModifiedUser);
}
public String toString(){
return "DataClass[categoryPosition="+categoryPosition+",subCategoryPosition="+subCategoryPosition
+",lastUpdateTime="+lastUpdateTime+",lastModifiedUser="+lastModifiedUser+"]";
}
public static boolean stringComparator(String str1, String str2){
return (str1 == null ? str2 == null : str1.equals(str2));
}
public int hashCode() {
int hash = 7;
hash = 31 * hash + (int) categoryPosition;
hash = 31 * hash + (int) subCategoryPosition
hash = 31 * hash + (lastModifiedUser == null ? 0 : lastModifiedUser.hashCode());
return hash;
}
}
The best work around i could think of is create 2 sets of strings by calling tostring() method of DataClass and compare string. It will result in 1000 (for making set1) + 1000 (for making set 2) + 1000 (searching in set ) = 3000 operations. I am stuck in Java 7. Is there any better way to do this? Thanks.
Let Java's builtin collections classes handle most of the optimization for you by taking advantage of a HashSet. The complexity of its contains method is O(1). I would highly recommend looking up how it achieves this because it's very interesting.
List<DataClass> a = object1.get("dbData");
HashSet<DataClass> b = new HashSet<>(object2.get("dbData"));
a.removeAll(b);
return a;
And it's all done for you.
EDIT: caveat
In order for this to work, DataClass needs to implement Object::hashCode. Otherwise, you can't use any of the hash-based collection algorithms.
EDIT 2: implementing hashCode
An object's hash code does not need to change every time an instance variable changes. The hash code only needs to reflect the instance variables that determine equality.
For example, imagine each object had a unique field private final UUID id. In this case, you could determine if two objects were the same by simply testing the id value. Fields like lastUpdateTime and lastModifiedUser would provide information about the object, but two instances with the same id would refer to the same object, even if the lastUpdateTime and lastModifiedUser of each were different.
The point is that if you really want to want to optimize this, include as few fields as possible in the hash computation. From your example, it seems like categoryPosition and subCategoryPosition might be enough.
Whatever fields you choose to include, the simplest way to compute a hash code from them is to use Objects::hash rather than running the numbers yourself.
It is a Set A-B operation(only retain elements in Set A that are not in Set B = A-B)
If using Set is fine then we can do like below. We can use ArrayList as well in place of Set but in AL case for each element to remove/retain check it needs to go through an entire other list scan.
Set<DataClass> a = new HashSet<>(object1.get("dbData"));
Set<DataClass> b = new HashSet<>(object2.get("dbData"));
a.removeAll(b);
If ordering is needed, use TreeSet.
Try to return a set from object1.get("dbData") and object2.get("dbData") that skips one more intermediate collection creation.

HashSet<POJO>.contains misbehaves

As part of a Hadoop Mapper, I have a HashSet<MySimpleObject> that contains instances of a very simple class with only two integer attributes. As one should, I customised hashCode() and equals():
public class MySimpleObject {
private int i1, i2;
public set(int i1, int i2) {
this.i1 = i1;
this.i2 = i2;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + i1;
result = prime * result + i2;
return result;
}
#Override
public boolean equals(Object obj) {
if (obj == null) return false;
if (this == obj) return true;
if ( obj.getClass() != MySimpleObject.class ) return false;
MySimpleObject other = (MySimpleObject)obj;
return (this.i1 == other.i1) && (this.i2 == other.i2);
}
Somehow, sometimes, calls to mySet.contains(aSimpleObj) return true though the set actually doesn't contain this value.
I understand how hashCode() is first used to split instances into buckets and equals() only called to compare instances within a given bucket.
I tried to change the prime value in hasCode() to spread instances differently into the buckets, and saw that contains() still sometimes returned a wrong result, but not for the same previously failing value.
It also seems that this value was then correctly identified as being outwith the set; I therefore suspect something is wrong with the equality check rather than the hashing, but I may be wrong...
I'm at a total loss here, and out of ideas. Can anyone shed light on this at all?
----- edit -----
some clarifications:
i1 & i2 are never updated after construction for the instances that were added to the set (though they are sometimes updated, elsewhere in the code, for other instances of that same class);
the set is potentially quite large (i.e. can reach nearly 15K entries) and I wonder if the issue could be linked to this (bucket overflow, e.g.?).
I bet you have trouble coming up with a concise reproduction of this bug.
Your code shown looks right. I think the objects in your collection are being mutated and this fact is obscured to you by other code.
You could debug this by temporarily adding:
Add boolean hashCodeCalled=false to your class
When hashCode() is called, set hashCodeCalled=true
When a setter is called, and that boolean is true, then throw an exception or log the current stack trace
Alternatively, you could refactor your code such that these instances are immutable and I bet the problem disappears.

Unique list of objects using HashSet

Can anyone tell me what the issue with my code is here? I converted the code from this post to use a String array instead of two ints, where I want a unique list based on the 0th index of the String array. The problem is that the overridden equals function is never getting called therefore I have repeated entries.
public static void main(String[] args)
{
class bin
{
String[] data;
bin (String[] data)
{
this.data=data;
}
#Override
public boolean equals(Object me)
{
bin binMe = (bin)me;
if(this.data[0].equals(binMe.data[0])) { return true; }
else { return false; }
}
#Override
public int hashCode()
{
final int prime = 31;
int result = 1;
result = prime * result + Arrays.hashCode(data);
return result;
}
#Override
public String toString()
{
return data[0] + " " + data[1];
}
}
Set<bin> q= new HashSet<bin>();
q.add(new bin(new String[]{"100", "200"}));
q.add(new bin(new String[]{"101", "201"}));
q.add(new bin(new String[]{"101", "202"}));
q.add(new bin(new String[]{"103", "203"}));
System.out.println(q);
}
Gives an output of: [101 202, 100 200, 101 201, 103 203]
If you want the comparison based on the first element, don't take the hash code of the full array
Arrays.hashCode(data);
Use
data[0].hashCode();
The problem is that the overridden equals function is never getting
called therefore I have repeated entries.
This is incorrect. It does get called. However, two set elements are considered equal only if for both, the equals method returns true and hashCode returns the same int. In your case, you have overriden the equals method to do the logical comparison based on the first element of the string array. However, you need to make sure the hashCode also returns the same int for two elements that you are think are logically equal.
So update the following statement in your hashCode implementation
from
result = prime * result + Arrays.hashCode(data);
to
result = prime * result + data[0].hashCode();
The way that Hash[Set/Map]'s work, is by using the hashCode to group items into lists, and then searching these lists, this means that if all the hashCodes are unique, the list is only 1 item, and it speeds up lookup for items.
If your hashCode points to the wrong list, there are no items to check for equality, so the equals method is never called, and every item gets added, not just the unique ones.
Instead of computing the hashCode over the whole array of Strings, use data[0].hashCode();, as per #cricket_007's answer

java implement compare on two elements

I am trying to override comparable thusly:
public int compareTo(Object other) {
if(other.getlength() > this.getlength()){
return 1;
} else if (other.getlength() < this.getlength()){
return -1;
} else {
if (other.getVal() > this.getVal()){
return 1;
} else {
return -1;
}
}
}
What I want to happen, is for the list to be sorted on the length first, then if the length is the same, I want the those same lengthed items to be sorted (in place) on their values. But my implementation is not working correctly. Can anyone see what I am doing wrong?
My results are:
a b = 3
a b c = 1
a b c = 1
a b = 2
a b = 1
The results I want are:
a b c = 1
a b c = 1
a b = 3
a b = 2
a b = 1
Avoid logic where possible. Seriously - where feasible, use arithmetic to avoid if/else's. It tends to be more reliable. In this case:
public int compareTo(Object o) {
int ret = other.getlength() - this.getlength();
if ( ret == 0 ) {
ret = other.getVal() - this.getVal();
}
return ret;
}
it is not clear from your remarks that list would be already sorted or not. But you can handle that by sorting the list after comparing there lengths. But on thing which you are obviously doing wrong is object.getValue()...this doesnt makes sense you have to iterate through both lists and compare values to conclude if they are equal.
It wasnt obvious without the example sorry for above comments, It is not possible to have this result with your comparator. Your logic looks correct to me. But it would be good idea to incorporate w00t's comments also otherwise you will have a<'b as well as a>b and could cause a runtime error. Please check if the comparator is applied properly to you sorting function ( objects ).

what would be a good hash function for an integer tuple?

I have this class...
public class StartStopTouple {
public int iStart;
public int iStop;
public int iHashCode;
public StartStopTouple(String start, String stop) {
this.iStart = Integer.parseInt(start);
this.iStop = Integer.parseInt(stop);
}
#Override
public boolean equals(Object theObject) {
// check if 'theObject' is null
if (theObject == null) {
return false;
}
// check if 'theObject' is a reference to 'this' StartStopTouple... essentially they are the same Object
if (this == theObject) {
return true;
}
// check if 'theObject' is of the correct type as 'this' StartStopTouple
if (!(theObject instanceof StartStopTouple)) {
return false;
}
// cast 'theObject' to the correct type: StartStopTouple
StartStopTouple theSST = (StartStopTouple) theObject;
// check if the (start,stop) pairs match, then the 'theObject' is equal to 'this' Object
if (this.iStart == theSST.iStart && this.iStop == theSST.iStop) {
return true;
} else {
return false;
}
} // equal() end
#Override
public int hashCode() {
return iHashCode;
}
}
... and I define equality between such Objects only if iStart and iStop in one Object are equal to iStart and iStop in the other Object.
So since I've overridden equals(), I need to override hashCode() but I'm not sure how to define a good hash function for this class. What would be a good way to create a hash code for this class using iStart and iStop?
I'd be tempted to use this, particularly since you're going to memoize it:
Long.valueOf((((long) iStart) << 32) | istop)).hashcode();
From Bloch's "Effective Java":
int iHashCode = 17;
iHashCode = 31 * iHashCode + iStart;
iHashCode = 31 * iHashCode + iStop;
Note: 31 is chosen because the multiplication by 31 can be optimized by the VM as bit operations. (But performance is not useful in your case since as mentioned by #Ted Hopp you are only computing the value once.)
Note: it does not matter if iHashCode rolls over past the largest int.
the simplest might be best
iHashCode = iStart^iStop;
the XOR of the two values
note this will give equal hashcodes when start and stop are swapped
as another possibility you can do
iHashCode = ((iStart<<16)|(iStart>>>16))^iStop;
this first barrel shifts start by 16 and then xors stop with it so the least significant bits are put apart in the xor (if start is never larger than 65k (of more accurately 2^16) you can omit the (iStart>>>16) part)

Categories

Resources