I have the funny situation, that I store a Coordinate into a HashMap<Coordinate, GUIGameField>.
Now, the strange thing about it is, that I have a fragment of code, which should guard, that no coordinate should be used twice. But if I debug this code:
if (mapForLevel.containsKey(coord)) {
throw new IllegalStateException("This coordinate is already used!");
} else {
...do stuff...
}
... the containsKey always returns false, although I stored a coordinate with a hashcode of 9731 into the map and the current coord also has the hashcode 9731.
After that, the mapForLevel.entrySet() looks like:
(java.util.HashMap$EntrySet) [(270,90)=gui.GUIGameField#29e357, (270,90)=gui.GUIGameField#ca470]
What could I have possibly done wrong? I ran out of ideas. Thanks for any help!
public class Coordinate {
int xCoord;
int yCoord;
public Coordinate(int x, int y) {
...store params in attributes...
}
...getters & setters...
#Override
public int hashCode() {
int hash = 1;
hash = hash * 41 + this.xCoord;
hash = hash * 31 + this.yCoord;
return hash;
}
}
You should override equals in addition to hashCode for it to work correctly.
EDIT : I have wrongly stated that you should use hashCode in your equals - this was not correct. While hashCode must return the same result for two equal objects, it still may return the same result for different objects.
It seems that you forgot to implement equals() method for your coordinate class. This is required by contract. Hah compares 2 entries with the same hash code using equals. In your case the Object.equals() is called that is always different for 2 different object because is based on reference to the object in memory.
The reason why you need to implement equals alongside with hashCode is because of the way hash tables work.
Hash tables associate an integer value (the hash of the Key) with the Value.
Think of it as an array of Value objects. When you insert in this table, value is stored at position key.hashCode().
This allows you to find any object in the table "straight away". You just have to compute the hashCode for that object and you'll know where it is on the table. Think of it as opposed to a Tree, in which you would need to navigate the tree to find the object).
However, there is a problem with this approach: there might be more than one object with the same hash code.
This would cause you to mistakenly associate two (or more) keys with the same value. This is called a collision.
There's a simple way to solve this problem: instead of mapping each hashcode to one Value, you can map it to a list of pairs Key-Value.
Now every time you're looking for an object in the hash map, after computing the hash you need to go through that list (the list of 'values' that are related to that hashcode) and find the correct one.
This is why you always need to implement equals on the key of the hash map.
Note: Hash tables are actually a bit more complex than this, but the idea is the same. You can read more about collision resolution here.
Define hashCode method in your Coordinate class. Make sure it returns unique code for unique objects and same for same objects.
Related
I'm currently working on a TD game with a map editor. Now obviously, you can save and load these maps (or should be able to, at least).
Problem is: at some point I'm calling .get() on a HashMap. Unfortunately, the keys that should be the same (logic-wise) are not the same object (in terms of reference), and, according to my previous google research, overriding their .equals method isn't sufficient, since they still return different hashes with .hashCode() (I verified that, they DO return different hashes, while .equals does return true).
(On a side note, that's rather confusing, since the javadoc of HashMap.get(key) only states that they have to be equal)
More specifically, the HashMap contains instances of a class Path of mine as keys, and should return the corresponding list of enemies (= value).
short version of Path (without getters etc.):
public class Path
{
private List<Tile> tiles = new ArrayList<>();
#Override
public boolean equals(Object obj) {
//code comparing the two paths
}
#Override
public int hashCode() {
//what I still need to implement. ATM, it returns super.hashCode()
}
}
public class Tile
{
private int x;
private int y;
//constructor
//overrides equals
//getters & some convenience methods
}
Now if two Paths are equal, I'd like them to return the same hash code, so that the HashMap returns the correct list of enemies. (I'll make sure not two identical paths can be added).
Now my question:
Do you suggest
using some external library to generate a hash
that I write my own implementation of calculating a hash, or
something else
?
Note that I'd prefer to avoid changing the HashMap to some other type of map, if that would even help solve the problem.
You definitely do need to implement your hashCode consistent with equals. IDEs often do decent job generating hashCode and equals. Also consider Objects.equals(...) and Objects.hash(...).
One warning about using Path as keys in the HashMap. You will have to make the class immutable to make it work reliably. Or at least make sure that hashCode of the key does not change. Otherwise you may not able to get you data back even with the same or equal key.
The List has a useful method which conveniently is also named list.hashCode(). This will compute the hashCode of all the elements inside the list. So you also have to implement the hashCode for Tile which probably consist of some primitive fields or such.
e.g.
#Override
public int hashCode() {
return tiles != null ? tiles.hashCode() : 0;
}
See the docs here
int hashCode()
Returns the hash code value for this list. The hash code of a list is defined to be the result of the following calculation:
int hashCode = 1;
for (E e : list)
hashCode = 31*hashCode + (e==null ? 0 : e.hashCode());
This ensures that list1.equals(list2) implies that list1.hashCode()==list2.hashCode() for any two lists, list1 and list2, as required by the general contract of Object.hashCode().
Let's say I have an object.
It contains width and height, x and y coordinates, and x and y representing velocity.
I have overridden equals method and I compare by comparing width, height, x and y and velocity.
What should I do with hash code?
The reason why I am confused is that it is a moving object and I am not sure what I should be using in order to calculate hash code, values are going to be constantly changing and the only thing that will remain static is size really.
According to Object.hashCode() there is a clause that can help with your decision:
Whenever it is invoked on the same object more than once during an
execution of a Java application, the hashCode method must consistently
return the same integer, provided no information used in equals
comparisons on the object is modified. This integer need not remain
consistent from one execution of an application to another execution
of the same application.
Since your equals() compares width, height, x and y and velocity, your hashcode() would not return the same hash whenever these values change.
Sample hashcode() for you:
#Override
public int hashCode() {
int hash = 1;
hash = hash * 17 + width;
hash = hash * 31 + height;
hash = hash * 13 + x;
hash = hash * 13 + y;
hash = hash * 13 + velocity.hashcode();
return hash;
}
You can go further by storing the hashcode in a private variable, and use a dirty flag to know to recalculate the hashcode of your object if any parameters change. Since you are not doing anything expensive within the hashcode() method, I don't think you would need to do that.
If you want your hash code code to compare equality then using the changing values will be just fine because the hash is calculated on the fly when you need it (or should be). So you have all these "moving" things and you want to know if they are equal (they have the same location or velocity or whatever) then it will be accurate.
If you want to use it in a hash table then don't override it and just use the default (The address in memory). Or if you want more control set a static counter and use that to create IDs for these objects.
Just make sure equals() and hashCode() are consistent.
That means, whatever fields are used to determine equality should be used to compute the hash code so that equal objects (with a certain state) will have the same hash code (for that same state).
But you may want to consider whether your equals() implementation is correct in the first place, or whether you even need to override the default implementation.
To be honest the only reason why I wanted to implement hash code is because java rules state that if equals is overriden then hash code should be too. However none of the objects will be used in a map.
In that case, you could implement it as follows:
public int hashCode() {
throw new UnsupportedOperationException("hashCode is not implemented");
}
This has the advantage that if you accidentally use the object as a hash key you will get an immediate exception rather than unexpected behavior if the object is mutated at the wrong time.
Typically, I would assign an ID to the object as a member variable and return the ID as the hashcode value.
Suppose I have the below class.
class S{
String txt = null;
S(String i){
txt=i;
}
public static void main(String args []){
S s1 = new S("a");
S s2 = new S("b");
S s3 = new S("a");
Map m = new HashMap ();
m.put(s1, "v11");
m.put(s2, "v22");
m.put(s3, "v33");
System.out.println(m.size());
}
//just a plain implementation
public boolean equals(Object o)
{
S cc = (S) o;
if (this.i.equals(cc.i))
{
return true;
}
else
{
return false;
}
}
public int hashCode()
{
return 222;
}
}
This will return size as 2 when running above. Its totally fine. If we comment the hashCode() it return 3 which is also correct. But if we comment the equals and keep the hashCode it should return 2 right? instead it returns 3. When putting objects to hashmap map will check the hash code of an object and if its same it will replace the previous value of the map to the new one right?
Thank You.
But if we comment the equals and keep the hashCode it should return 2
right? instead it returns 3.
3 items is the correct behaviour. 3 objects will be hashed to the same bucket, but because all 3 are different this bucket will contain a chain of values (linked list for HashMap in Java) with the same hash code but not equal to each other.
When putting objects to hashmap map will check the hash code of an object and if its same
it will replace the previous value of the map to the new one right?
If they are hashed to the same bucket it doesn't mean that one value will replace another. Then these values will be compared for equality. If they are equal then old value will be replaced, if they are not - new value will be added to the tail of the linked list (for this bucket).
The hashcode is simply used to determine the bucket in which to place the object. Each bucket can contain more than once object. So hashcode must be implemented to ensure that equal objects go in the same bucket. In other words equal objects must have the same hashcode but objects with the same hashcode aren't necessarily equal.
When you override only hashcode nothing really changes. You are just putting every object in the same bucket with return 222. So the HashMap is more inefficient, but its contract doesn't change.
The hashcode is the first, quick method to find if two objects are equal or not. It is used by hash containers to decide in which "slot" the object may go, and to retrieve it without checking for all of the objects in all of the slots.
If your hashcode is always the same, then all the objects will be directed to the same slot. This is called collision. Insertions will be slower, because after the collision the container will have to check if the objects already in that slot match the new one (equals). Also, retrieval will be slower because it will have to check all of them sequentially until it finds the right one(equals againg). Finally, probably there will be a lot of unused memory wasted in slots that will not be used.
In essence, by no implementing a sensible hashcode you are converting the hashcontainers in lists (and inefficient ones).
If we comment the hashCode() it return 3 which is also correct.
This is not correct! There are only 2 different objects: "a" and "b". The equals method says what is equal and what is not. The expected size is 2. But, because the equals-hashcode contract is broken, the returned size is 3.
When overriding the equals() function of java.lang.Object, the javadocs suggest that,
it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)
How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?
public class People{
public String name;
public int age;
public int hashCode(){
// How to get a unique integer based on name and age?
}
}
/*******************************/
public class App{
public static void main( String args[] ){
People mike = new People();
People melissa = new People();
mike.name = "mike";
mike.age = 23;
melissa.name = "melissa";
melissa.age = 24;
System.out.println( mike.hasCode() ); // output?
System.out.println( melissa.hashCode(); // output?
}
}
It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.
IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).
For example, here's a hashCode function that Idea generates for your People class:
public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + age;
return result;
}
I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like
public boolean equals(Object obj) {
if (this==obj) return true;
if (obj==null) return false;
if (!(getClass().equals(obj.getClass())) return false;
Person other = (Person)obj;
return (name==null ? other.name==null : name.equals(other.name)) &&
age==other.age;
}
Any time you override equals you must override hashCode. Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age;
}
Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}
Also, it's perfectly valid for two non-equals objects to have the same hash code:
public int hashCode() {
return age;
}
In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.
Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.
A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.
Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.
You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.
The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.
One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.
Edit based on comments:
1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).
2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.
In general the hash code cannot be unique, as there are more values than possible hash codes (integers).
A good hash code distributes the values well over the integers.
A bad one could always give the same value and still be logically correct, it would just lead to unacceptably inefficient hash tables.
Equal values must have the same hash value for hash tables to work correctly.
Otherwise you could add a key to a hash table, then try to look it up via an equal value with a different hash code and not find it.
Or you could put an equal value with a different hash code and have two equal values at different places in the hash table.
In practice you usually select a subset of the fields to be taken into account in both the hashCode() and the equals() method.
I think you misunderstood it. The hashcode does not have to be unique to each object (after all, it is a hash code) though you obviously don't want it to be identical for all objects. You do, however, need it to be identical to all objects that are equal, otherwise things like the standard collections would not work (e.g., you'd look up something in the hash set but would not find it).
For straightforward attributes, some IDEs have hashcode function builders.
If you don't use IDEs, consider using Apahce Commons and the class HashCodeBuilder
The only contractual obligation for hashCode is for it to be consistent. The fields used in creating the hashCode value must be the same or a subset of the fields used in the equals method. This means returning 0 for all values is valid, although not efficient.
One can check if hashCode is consistent via a unit test. I written an abstract class called EqualityTestCase, which does a handful of hashCode checks. One simply has to extend the test case and implement two or three factory methods. The test does a very crude job of testing if the hashCode is efficient.
This is what documentation tells us as for hash code method
# javadoc
Whenever it is invoked on
the same object more than once during
an execution of a Java application,
the hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.
There is a notion of business key, which determines uniqueness of separate instances of the same type. Each specific type (class) that models a separate entity from the target domain (e.g. vehicle in a fleet system) should have a business key, which is represented by one or more class fields. Methods equals() and hasCode() should both be implemented using the fields, which make up a business key. This ensures that both methods consistent with each other.
In Java, I have a subclass Vertex of the Java3D class Point3f. Now Point3f computes equals() based on the values of its coordinates, but for my Vertex class I want to be stricter: two vertices are only equal if they are the same object. So far, so good:
class Vertex extends Point3f {
// ...
public boolean equals(Object other) {
return this == other;
}
}
I know this violates the contract of equals(), but since I'll only compare vertices to other vertices this is not a problem.
Now, to be able to put vertices into a HashMap, the hashCode() method must return results consistent with equals(). It currently does that, but probably bases its return value on the fields of the Point3f, and therefore will give hash collisions for different Vertex objects with the same coordinates.
Therefore I would like to base the hashCode() on the object's address, instead of computing it from the Vertex's fields. I know that the Object class does this, but I cannot call its hashCode() method because Point3f overrides it.
So, actually my question is twofold:
Should I even want such a shallow equals()?
If yes, then, how do I get the object's address to compute the hash code from?
Edit: I just thought of something... I could generate a random int value on object creation, and use that for the hash code. Is that a good idea? Why (not)?
Either use System.identityHashCode() or use an IdentityHashMap.
System.identityHashCode() returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode().
You use a delegate even though this answer is probably better.
class Vertex extends Point3f{
private final Object equalsDelegate = new Object();
public boolean equals(Object vertex){
if(vertex instanceof Vertex){
return this.equalsDelegate.equals(((Vertex)vertex).equalsDelegate);
}
else{
return super.equals(vertex);
}
}
public int hashCode(){
return this.equalsDelegate.hashCode();
}
}
Just FYI, your equals method does NOT violate the equals contract (for the base Object's contract that is)... that is basically the equals method for the base Object method, so if you want identity equals instead of the Vertex equals, that is fine.
As for the hash code, you really don't need to change it, though the accepted answer is a good option and will be a lot more efficient if your hash table contains a lot of vertex keys that have the same values.
The reason you don't need to change it is because it is completely fine that the hash code will return the same value for objects that equals returns false... it is even a valid hash code to just return 0 all the time for EVERY instance. Whether this is efficient for hash tables is completely different issue... you will get a lot more collisions if a lot of your objects have the same hash code (which may be the case if you left hash code alone and had a lot of vertices with the same values).
Please don't accept this as the answer though of course (what you chose is much more practical), I just wanted to give you a little more background info about hash codes and equals ;-)
Why do you want to override hashCode() in the first place? You'd want to do it if you want to work with some other definition of equality. For example
public class A {
int id;
public boolean equals(A other) { return other.id==id}
public int hashCode() {return id;}
}
where you want to be clear that if the id's are the same then the objects are the same, and you override hashcode so that you can't do this:
HashSet hash= new HashSet();
hash.add(new A(1));
hash.add(new A(1));
and get 2 identical(from the point of view of your definition of equality) A's.
The correct behavior would then be that you'd only have 1 object in the hash, the second write would overwrite.
Since you are not using equals as a logical comparison, but a physical one (i.e. it is the same object), the only way you will guarantee that the hashcode will return a unique value, is to implement a variation of your own suggestion. Instead of generating a random number, use UUID to generate an actual unique value for each object.
The System.identityHashCode() will work, most of the time, but is not guaranteed as the Object.hashCode() method is not guaranteed to return a unique value for every object. I have seen the marginal case happen, and it will probably be dependent on the VM implementation, which is not something you will want your code be dependent on.
Excerpt from the javadocs for Object.hashCode():
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
The problem this addresses, is the case of having two separate point objects from overwriting each other when inserted into the hashmap because they both have the same hash. Since there is no logical equals, with the accompanying override of hashCode(), the identityHashCode method can actually cause this scenario to occur. Where the logical case would only replace hash entries for the same logical point, using the system based hash can cause it to occur with any two objects, equality (and even class) is no longer a factor.
The function hashCode() is inherited from Object and works exactly as you intend (on object level, not coordinate-level). There should be no need to change it.
As for your equals-method, there is no reason to even use it, since you can just do obj1 == obj2 in your code instead of using equals, since it's meant for sorting and similar, where comparing coordinates makes a lot more sense.