Comparing java maps via hashing - java

I want to compare two Java Maps by a simple hash.
Each object is on a different computer, so sending a hash over the network will be cheaper that sending the whole object to compare.
For example I have two HashMaps of an ExampleClass
Map<String,ExampleClass> One=new ...;
Map<String,ExampleClass> Other=new ...;
I don't need to be sure that all elements are equal,
it's enough for me to trust in a hash.
I was about to iterate at each side and create a "homemade Hash", then send it to the network to finally compare for example an int or something.
It would be great if this "hash" is calculated every time an object is added or deleted from the Collection, saving me from iterate the whole object. I have to encapsulate every add/delete of the Map. Is there a Java library that does this?

If all your classes implement hashCode() (does not use the "default" memory address hashcode) you can use the map's hashCode().
The caveat here is that if your ExampleClass does not implement hashCode(), then equal items might have different hashes on the two different machines, which will result in different hashes for the maps.
To clarify:
Map implements a hashCode() that is defined as the sum of it's Map.Enytry's hashCode()s.
Map.Entry's hashCode() is defined to be the xor of the key's hashCode() and the value's hashCode().
Your keys are Strings -- they have a well defined hashCode() (two equal strings always have the same hashCode()).
Your values are ExampleClass instances -- they also need a well-defined hashCode().
In summary, a map that contains { s1 -> ec1, s2 -> ec2 } will have a hashcode equal to:
(s1.hashCode() ^ ec1.hashCode()) + (s2.hashCode() ^ ec2.hashCode())
meaning that it depends on ExampleClass's hashCode().
If ExampleClass did implement hashCode() in a way that equal ExampleClasses give equal hashCode()s, everything will work well.
If ExampleClass did not implement hashCode(), it will use Object's hashCode(), which will almost always give you different hashCodes().

A simple solution is just to xor the hash of every object in the map, or some simple derivation thereof. Because a ^ a = 0 and a ^ b ^ a = b for all a and b, (xor is commutative, associative, and its own inverse), and since xor is cheap, your add and remove can just xor the (possibly derived) hash code of the added or deleted item.
You may want to use a derived hash value to avoid cases where your map has all the same keys and values, but some of the mappings between them are transposed. A simple derived hash might be key.hashCode() - value.hashCode(), which would avoid most of these cases.
So, your code might look like:
public class MyMap<K, V> extends HashMap<K, V>{
private int hash = 0;
#Override
public int hashCode() {return hash;}
#Override
public V put(K key, V value) {
V old = super.put(key, value);
if (old != null) this.hash ^= key.hashCode() - old.hashCode();
this.hash ^= key.hashCode() - value.hashCode();
return ret;
}
#Override
public V remove(K key) {
V ret = super.remove(key);
if (ret != null) this.hash ^= key.hashCode() - ret.hashCode();
return ret;
}
}
Note that some of the more advanced methods (eg. adding multiple items from a collection) may or may not be safe depending on implementation.

Related

How to implement own hashing function for strings?

So this is the default algorithm that generates the hashcode for Strings:
s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
However, I wanna use something different and much more simple like adding the ASCII values of each character and then adding them all up.
How do I make it so that it uses the algorithm I created, instead of using the default one when I use the put() method for hashtables?
As of now I don't know what to do other than implementing a hash table from scratch.
Create a new class, and use String type field in it. For example:
public class MyString {
private final String value;
public MyString(String value) {
this.value = value;
}
public String getValue() {
return value;
}
#Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
MyString myString = (MyString) o;
return Objects.equals(value, myString.value);
}
#Override
public int hashCode() {
// use your own implementation
return value.codePoints().sum();
}
}
Add equals() and hashCode() methods with #Override annotation.
Note: here hashCode() operates only with ASCII values.
After that, you will be able to use new class objects in the desired data structure. Here you can find a detailed explanation of these methods and a contract between equals() and hashCode().
However, I wanna use something different and much more simple like adding the ASCII values of each character and then adding them all up.
This is an extremely bad idea if you care at all about hash table efficiency. What you're thinking of as an overly-complicated hashing function is actually designed to give a uniform distribution of hash values throughout the entire 32-bit (or whatever) range. That gives the best possibility of uniformly distributing the hash keys (after you mod by the hash table size) in your buckets.
Your simple method of adding up the ASCII values of the individual characters has multiple flaws. First, you're limited in the range of values you can reasonably expect to generate. The highest value you can create is 255*n, where n is the length of the key. If your key is 10 characters in length, then you can't possibly generate more than 2,550 unique hash values. But there are 255^10 possible 10-character strings. Your collision rate will be very high.
The second problem is that anagrams generate the same hash value. "stop," "spot," and "tops" all generate the same hash value and will hash to the same bucket. Again, this will greatly affect your collision rate.
It's unclear to me why you want to replace the hashing function. If you're thinking it will result in better performance, you should think again. Sure, it will make generating the hash value faster, but it will result in very skewed key distribution, and correspondingly terrible hash table performance.

Although hash values are different, still why are my objects stored in the same location?

I have a Movie class and I override only hashCode() method. Please find below the java class
public class Movie {
private String actor;
private String name;
private String releaseYr;
public String getActor() {
return actor;
}
public void setActor(String actor) {
this.actor = actor;
}
public String getName() {
return name;
}
public void setName(String name) {
this.name = name;
}
public String getReleaseYr() {
return releaseYr;
}
public void setReleaseYr(String releaseYr) {
this.releaseYr = releaseYr;
}
#Override
public int hashCode() {
return actor.hashCode() + name.hashCode() + releaseYr.hashCode();
}
}
I have created two Movie objects and both the object's all the property values are same and put them in a HashMap. Below is the code
import java.util.HashMap;
public class Test {
public static void main(String[] args) {
Movie m1 = new Movie();
m1.setActor("Akshay");
m1.setName("Taskvir");
m1.setReleaseYr("2010");
Movie m2 = new Movie();
m2.setActor("Akshay");
m2.setName("Taskvir");
m2.setReleaseYr("2010");
HashMap<Movie, String> map = new HashMap<Movie, String>();
map.put(m1, "Value of m1");
map.put(m2, "Value of m2");
}
}
I am getting the expected result. As I override only hashCode() method and both the object's hash values are same, so they are stored in the same index location of HashMap table array. Below is the expected result in debugging mode.
But if I do not override the hashCode() method but override the equals() method they are stored in the same index location of HashMap table array. Although I can see the hash values are different. Below is my equals method
#Override
public boolean equals(Object obj) {
Movie m1 = (Movie) obj;
boolean result = false;
if (m1.getActor().equals(this.actor) && m1.getName().equals(this.name)
&& m1.getReleaseYr().equals(this.releaseYr)) {
result = true;
}
return result;
}
The output in debugging mode
If I do not override the equals and hashCode methods then also I am getting same unexpected result.
As per my understanding, if I do not override equals and hashCode methods or only override the equals method then m1 and m2 objects should stored in the different location as hash values are different for m1 and m2 objects. But in this case it is not happening.
Can some one please explain me why with different hash values, my objects stored in the same location?
I have used Java 8.
Hash codes have a huge range, from Integer.MIN_VALUE to Integer.MAX_VALUE, while a HashMap usually has much fewer buckets (by default, 16 for a newly instantiated HashMap, at least with OpenJDK 11). Thus, it's entirely possible, even expected, that hash codes will collide, and multiple objects will be added to the same bucket. However, note that if you're not overriding hashCode() this behavior is completely incidental, and can't be relied upon.
No matter how the hash code is computed, by your method or by the default from the Object class, different objects can get mapped to the same hashmap bucket (array index). The hash code is divided by the array size, and the remainder gives the bucket number.
Both of your hash codes produced by Object.hashCode() (31622540 and 27844196) happen to produce the identical remainder 4 when divided by 16 (the initial HashMap array size).
So, with 4 billion different hash codes available, some of them must end up in the same bucket, as it would be a waste of memory to allocate a 4-billion-elements array for every hash map.
To make the hash map work as expected, it's important that objects that are equal give the same hash code.
If you override only the equals() method, the Object.hashCode() doesn't fulfill that requirement, and you have to override hashCode() as well - otherwise the get() method won't find the objects you stored in the map.
If you want two movies to be equals() if their fields are equal, you should supply an appropriate hashCode() method as well the way you did.
Let's have a look at the possible overriding combinations.
Override nothing
Both movies are different, end up as different hash map entries, maybe in the same, maybe in different buckets.
Only override hashCode()
Both movies are different, end up as different hash map entries in the same bucket. It's nonsense to invent your own hashCode() implementation if you still use the Object definition of equality.
Override both hashCode() and equals()
Both movies are equal, end up as only one hash map entry, with the later-stored value winning. This happens because the second put() finds an entry with an equal key under the hash code's bucket, and simply replaces its value part.
Only override equals()
BIG MISTAKE! Both movies are equal, but this isn't reflected by the hashCode() computation, so it's just a matter of good luck whether the search for an existing value looks into the correct bucket.

Is there any chance for the hash codes of two different objects of being same? [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

Java, Date, Array, hashcode() [duplicate]

In Java, obj.hashCode() returns some value. What is the use of this hash code in programming?
hashCode() is used for bucketing in Hash implementations like HashMap, HashTable, HashSet, etc.
The value received from hashCode() is used as the bucket number for storing elements of the set/map. This bucket number is the address of the element inside the set/map.
When you do contains() it will take the hash code of the element, then look for the bucket where hash code points to. If more than 1 element is found in the same bucket (multiple objects can have the same hash code), then it uses the equals() method to evaluate if the objects are equal, and then decide if contains() is true or false, or decide if element could be added in the set or not.
From the Javadoc:
Returns a hash code value for the object. This method is supported for the benefit of hashtables such as those provided by java.util.Hashtable.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the Java programming language.)
hashCode() is a function that takes an object and outputs a numeric value. The hashcode for an object is always the same if the object doesn't change.
Functions like HashMap, HashTable, HashSet, etc. that need to store objects will use a hashCode modulo the size of their internal array to choose in what "memory position" (i.e. array position) to store the object.
There are some cases where collisions may occur (two objects end up with the same hashcode), and that, of course, needs to be solved carefully.
The value returned by hashCode() is the object's hash code, which is the object's memory address in hexadecimal.
By definition, if two objects are equal, their hash code must also be equal. If you override the equals() method, you change the way two objects are equated and Object's implementation of hashCode() is no longer valid. Therefore, if you override the equals() method, you must also override the hashCode() method as well.
This answer is from the java SE 8 official tutorial documentation
A hashcode is a number generated from any object.
This is what allows objects to be stored/retrieved quickly in a Hashtable.
Imagine the following simple example:
On the table in front of you. you have nine boxes, each marked with a number 1 to 9. You also have a pile of wildly different objects to store in these boxes, but once they are in there you need to be able to find them as quickly as possible.
What you need is a way of instantly deciding which box you have put each object in. It works like an index. you decide to find the cabbage so you look up which box the cabbage is in, then go straight to that box to get it.
Now imagine that you don't want to bother with the index, you want to be able to find out immediately from the object which box it lives in.
In the example, let's use a really simple way of doing this - the number of letters in the name of the object. So the cabbage goes in box 7, the pea goes in box 3, the rocket in box 6, the banjo in box 5 and so on.
What about the rhinoceros, though? It has 10 characters, so we'll change our algorithm a little and "wrap around" so that 10-letter objects go in box 1, 11 letters in box 2 and so on. That should cover any object.
Sometimes a box will have more than one object in it, but if you are looking for a rocket, it's still much quicker to compare a peanut and a rocket, than to check a whole pile of cabbages, peas, banjos, and rhinoceroses.
That's a hash code. A way of getting a number from an object so it can be stored in a Hashtable. In Java, a hash code can be any integer, and each object type is responsible for generating its own. Lookup the "hashCode" method of Object.
Source - here
Although hashcode does nothing with your business logic, we have to take care of it in most cases. Because when your object is put into a hash based container(HashSet, HashMap...), the container puts/gets the element's hashcode.
hashCode() is a unique code which is generated by the JVM for every object creation.
We use hashCode() to perform some operation on hashing related algorithm like Hashtable, Hashmap etc..
The advantages of hashCode() make searching operation easy because when we search for an object that has unique code, it helps to find out that object.
But we can't say hashCode() is the address of an object. It is a unique code generated by JVM for every object.
That is why nowadays hashing algorithm is the most popular search algorithm.
One of the uses of hashCode() is building a Catching mechanism.
Look at this example:
class Point
{
public int x, y;
public Point(int x, int y)
{
this.x = x;
this.y = y;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Point point = (Point) o;
if (x != point.x) return false;
return y == point.y;
}
#Override
public int hashCode()
{
int result = x;
result = 31 * result + y;
return result;
}
class Line
{
public Point start, end;
public Line(Point start, Point end)
{
this.start = start;
this.end = end;
}
#Override
public boolean equals(Object o)
{
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
Line line = (Line) o;
if (!start.equals(line.start)) return false;
return end.equals(line.end);
}
#Override
public int hashCode()
{
int result = start.hashCode();
result = 31 * result + end.hashCode();
return result;
}
}
class LineToPointAdapter implements Iterable<Point>
{
private static int count = 0;
private static Map<Integer, List<Point>> cache = new HashMap<>();
private int hash;
public LineToPointAdapter(Line line)
{
hash = line.hashCode();
if (cache.get(hash) != null) return; // we already have it
System.out.println(
String.format("%d: Generating points for line [%d,%d]-[%d,%d] (no caching)",
++count, line.start.x, line.start.y, line.end.x, line.end.y));
}

Why does HashSet allow equal items if hashcodes are different?

The HashSet class has an add(Object o) method, which is not inherited from another class. The Javadoc for that method says the following:
Adds the specified element to this set if it is not already present. More formally, adds the specified element e to this set if this set contains no element e2 such that (e==null ? e2==null : e.equals(e2)). If this set already contains the element, the call leaves the set unchanged and returns false.
In other words, if two objects are equal, then the second object will not be added and the HashSet will remain the same. However, I've discovered that this is not true if objects e and e2 have different hashcodes, despite the fact that e.equals(e2). Here is a simple example:
import java.util.HashSet;
import java.util.Iterator;
import java.util.Random;
public class BadHashCodeClass {
/**
* A hashcode that will randomly return an integer, so it is unlikely to be the same
*/
#Override
public int hashCode(){
return new Random().nextInt();
}
/**
* An equal method that will always return true
*/
#Override
public boolean equals(Object o){
return true;
}
public static void main(String... args){
HashSet<BadHashCodeClass> hashSet = new HashSet<>();
BadHashCodeClass instance = new BadHashCodeClass();
System.out.println("Instance was added: " + hashSet.add(instance));
System.out.println("Instance was added: " + hashSet.add(instance));
System.out.println("Elements in hashSet: " + hashSet.size());
Iterator<BadHashCodeClass> iterator = hashSet.iterator();
BadHashCodeClass e = iterator.next();
BadHashCodeClass e2 = iterator.next();
System.out.println("Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): " + (e==null ? e2==null : e.equals(e2)));
}
The results from the main method are:
Instance was added: true
Instance was added: true
Elements in hashSet: 2
Element contains e and e2 such that (e==null ? e2==null : e.equals(e2)): true
As the example above clearly shows, HashSet was able to add two elements where e.equals(e2).
I'm going to assume that this is not a bug in Java and that there is in fact some perfectly rational explanation for why this is. But I can't figure out what exactly. What am I missing?
I think what you're really trying to ask is:
"Why does a HashSet add objects with inequal hash codes even if they claim to be equal?"
The distinction between my question and the question you posted is that you're assuming this behavior is a bug, and therefore you're getting grief for coming at it from that perspective. I think the other posters have done a thoroughly sufficient job of explaining why this is not a bug, however they have not addressed the underlying question.
I will try to do so here; I would suggest rephrasing your question to remove the accusations of poor documentation / bugs in Java so you can more directly explore why you're running into the behavior you're seeing.
The equals() documentations states (emphasis added):
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
The contract between equals() and hashCode() isn't just an annoying quirk in the Java specification. It provides some very valuable benefits in terms of algorithm optimization. By being able to assume that a.equals(b) implies a.hashCode() == b.hashCode() we can do some basic equivalence tests without needing to call equals() directly. In particular, the invariant above can be turned around - a.hashCode() != b.hashCode() implies a.equals(b) will be false.
If you look at the code for HashMap (which HashSet uses internally), you'll notice an inner static class Entry, defined like so:
static class Entry<K,V> implements Map.Entry<K,V> {
final K key;
V value;
Entry<K,V> next;
int hash;
...
}
HashMap stores the key's hash code along with the key and value. Because a hash code is expected to not change over the time a key is stored in the map (see Map's documentation, "The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.") it is safe for HashMap to cache this value. By doing so, it only needs to call hashCode() once for each key in the map, as opposed to every time the key is inspected.
Now lets look at the implementation of put(), where we see these cached hashes being taken advantage of, along with the invariant above:
public V put(K key, V value) {
...
int hash = hash(key);
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
// Replace existing element and return
}
}
// Insert new element
}
In particular, notice that the conditional only ever calls key.equals(k) if the hash codes are equal and the key isn't the exact same object, due to short-circuit evaluation. By the contract of these methods, it should be safe for HashMap to skip this call. If your objects are incorrectly implemented, these assumptions being made by HashMap are no longer true, and you will get back unusable results, including "duplicates" in your set.
Note that your claim "HashSet ... has an add(Object o) method, which is not inherited from another class" is not quite correct. While its parent class, AbstractSet, does not implement this method, the parent interface, Set, does specify the method's contract. The Set interface is not concerned with hashes, only equality, therefore it specifies the behavior of this method in terms of equality with (e==null ? e2==null : e.equals(e2)). As long as you follow the contracts, HashSet works as documented, but avoids actually doing wasteful work whenever possible. As soon as you break the rules however, HashSet cannot be expected to behave in any useful way.
Consider also that if you attempted to store objects in a TreeSet with an incorrectly implemented Comparator, you would similarly see nonsensical results. I documented some examples of how a TreeSet behaves when using an untrustworthy Comparator in another question: how to implement a comparator for StringBuffer class in Java for use in TreeSet?
You've violated the contract of equals/hashCode basically:
From the hashCode() docs:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
and from equals:
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
HashSet relies on equals and hashCode being implemented consistently - the Hash part of the name HashSet basically implies "This class uses hashCode for efficiency purposes." If the two methods are not implemented consistently, all bets are off.
This shouldn't happen in real code, because you shouldn't be violating the contract in real code...
#Override
public int hashCode(){
return new Random().nextInt();
}
You are returning different has codes for same object every time it is evaluated. Obviously you will get wrong results.
add() function is as follows
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
and put() is
public V put(K key, V value) {
if (key == null)
return putForNullKey(value);
int hash = hash(key.hashCode());
int i = indexFor(hash, table.length);
for (Entry<K,V> e = table[i]; e != null; e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
V oldValue = e.value;
e.value = value;
e.recordAccess(this);
return oldValue;
}
}
modCount++;
addEntry(hash, key, value, i);
return null;
}
If you notice first has is calculated which is different in your case which is why object is added. equals() comes into picture only if hash are same for objects i.e collision has occured. Since in case hash are different equals() is never executed
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
Read more on what short circuiting is. since e.hash == hash is false nothing else is evaluated.
I hope this helps.
because hashcode() is really implemented very badly,
it will try to equate in each random bucket on each add(), if you return constant value from hashcode() it wouldn't let you enter any
It is not required that hash codes be different for all elements! It is only required that two elements are not equal.
HashCode is used first to find the hash bucket the object should occupy. If hadhcodes are different, objects are assumed to be not equal. If hashcodes are equal, then the equals() method is used to determine equality. The use of hashCode is an efficiency mechanism.
And...
Your hash code implementation violates the contract that it should not change unless the objects identifying fields change.

Categories

Resources