Java Hashset Storing items based on value and not hashcode [duplicate] - java

This question already has answers here:
Hashcode of an int
(4 answers)
Closed 4 years ago.
HashSet hs = new HashSet();
hs.add(1000);
hs.add(new Integer(1000));
System.out.println(hs);
The above code prints [1000] but I have used the new operator which shall create a new object in memory and hence the hash code must be different ,so should it not have two values in hashset?

I have used the new operator which shall create a new object in memory and hence the hash code must be different
That assumption is not correct. The default hashCode implementation returns a different hash for different instances, but that is not a requirement. In many cases you actually want different instances to return the same hashCode (calculated from instance members) to be able to compare instances for equality.
From the documentation of Integer hashCode:
Returns: a hash code value for this object, equal to the primitive int value represented by this Integer object.
If you actually want a map that doesn't use equals/hashCode, take a look at the IdentityHashMap class.

To understand it better lets make a small test, lets find the hashCode of this cases :
int i1 = 1000;
Integer i2 = 1000;
Integer i3 = new Integer(1000);
System.out.println(Integer.valueOf(i1).hashCode());
System.out.println(i2.hashCode());
System.out.println(i3.hashCode());
All the cases return the same hashCode.
Outputs
1000
1000
1000
For that you get one value in the Set and not two like you expected.

Integer's hashcode() method returns hash code value for the object which is equal to the internally stored primitive int value, which in your case is 1000.

Related

Comparing string in java [duplicate]

We know that hashCode() method of an object gives a hash-code based on the memory address of the instance of the object. So when we have two objects of a same class with same data it will still give different hash-code as they are stored in different memory location.
Now, when we create two string objects using new String("Some_Name") we will have two objects that are stored in different address. When we see the hashcode for these two objects we should get different hashcodes as they are stored in different memory location. But we end up getting same hashcode as result.
Employee empObject = new Employee("Some_Name");
Employee empObject1 =new Employee("Some_Name");
String stringObject= new String("Some_Name");
String stringObject1=new String("Some_Name");
//Output
System.out.println(empObject.hashCode()); //1252169911
System.out.println(empObject1.hashCode()); //2101973421
System.out.println(stringObject.hashCode()); //1418906358
System.out.println(stringObject1.hashCode()); //1418906358
Does this mean that String object has overridden hashCode() method from Object. If so in the overridden method it has to search for other String objects with same data in the heap and put a constant hashCode for all. Help me if my basics understanding itself is wrong.
Note: It is not about String literal in-fact it is about String Object as literals are stored in String Constant Pool and String Object is created outside the pool inside heap as Object.
This is what i found with a simple google search:
An object’s hash code allows algorithms and data structures to put objects into compartments, just like letter types in a printer’s type case. The printer puts all A types into the compartment for A, and he looks for an A only in this one compartment. This simple system lets him find types much faster than searching in an unsorted drawer. That’s also the idea of hash-based collections, such as HashMap and HashSet.
The contract is explained in the hashCode method’s JavaDoc. It can be roughly summarized with this statement:
Objects that are equal must have the same hash code within a running process
Unequal objects must have different hash codes – WRONG!
Objects with the same hash code must be equal – WRONG!
The contract allows for unequal objects to share the same hash code, such as the A and µ objects in the sketch above. In math terms, the mapping from objects to hash codes doesn’t have to be injective or even bijective. This is obvious because the number of possible distinct objects is usually bigger than the number of possible hash codes 2^32.
Hope it helps
Explanation
Your first paragraph describes the default behavior of hashCode. But usually classes override it and create a content-based solution (same as for equals). This especially applies to the String class.
Default hashCode
The default implementation is not done in Java but directly implemented in the JVM, it has a native keyword. You can always get hands on the original hashCode by using System#identityHashCode, see its documentation:
Returns the same hash code for the given object as would be returned by the default method hashCode(), whether or not the given object's class overrides hashCode(). The hash code for the null reference is zero.
Note that the default implementation of hashCode is not necessarily based on the memory location. It often is related, but you can by no means rely on that (see How is hashCode() calculated in Java). Here is the documentation of Object#hashCode:
Returns a hash code value for the object. This method is supported for the benefit of hash tables such as those provided by HashMap.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
The relevant parts are the second and third requirement. It must behave the same as equals and hash-collisions are okay (but not optimal).
And Object#equals is typically used to create custom content-based comparisons (see documentation).
String hashCode
Now let us take a look at the implementation of String#hashCode. As said, the class overrides the method and implements a content-based solution. So the hash for "hello" will always be the same as for "hello". Even if you force new instances using the constructor:
// Will have the same hash
new String("hello").hashCode()
new String("hello").hashCode()
It works exactly as equals, which would output true here as well:
new String("hello").equals(new String("hello")) // true
as required by the contract of the hashCode method (see documentation).
Here is the implementation of the method (JDK 10):
/**
* Returns a hash code for this string. The hash code for a
* {#code String} object is computed as
* <blockquote><pre>
* s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]
* </pre></blockquote>
* using {#code int} arithmetic, where {#code s[i]} is the
* <i>i</i>th character of the string, {#code n} is the length of
* the string, and {#code ^} indicates exponentiation.
* (The hash value of the empty string is zero.)
*
* #return a hash code value for this object.
*/
public int hashCode() {
int h = hash;
if (h == 0 && value.length > 0) {
hash = h = isLatin1() ? StringLatin1.hashCode(value)
: StringUTF16.hashCode(value);
}
return h;
}
Which just forwards to either StringLatin1 or StringUTF16, let us see what they have:
// StringLatin1
public static int hashCode(byte[] value) {
int h = 0;
for (byte v : value) {
h = 31 * h + (v & 0xff);
}
return h;
}
// StringUTF16
public static int hashCode(byte[] value) {
int h = 0;
int length = value.length >> 1;
for (int i = 0; i < length; i++) {
h = 31 * h + getChar(value, i);
}
return h;
}
As you see, both of them just do some simple math based on the individual characters in the string. So it is completely content-based and will thus obviously result in the same result for the same characters always.

hashCode() method [duplicate]

This question already has answers here:
What is the use of hashCode in Java?
(8 answers)
Closed 2 years ago.
I need some help in better understanding hashCode() method in a theoretical way. I´ve read (emphasis mine):
When hashCode() is called on two separate objects (which are equal according to the equals() method) it returns the same hash code value. However, if it is called on two unequal objects, it will not necessarily return different integer values.
Where can said exceptions occur?
Suppose you have a class with two String fields, and that its hashcode is calculated by summing the hashcodes of those two fields. Further suppose you have an equals that simply checks whether the class fields values are equal.
class Test {
String a;
String b;
public Test
#Override
public int hashCode() {
return a.hashCode() + b.hashCode();
}
#Override
public boolean equals(Object o) { // simplified
Test other = (Test)o;
return a.equals(other.a) && b.equals(other.b);
}
}
Let's see if non-equal instances can have the same hashcode
Test t1 = new Test("hello", "world");
Test t2 = new Test("world", "hello");
System.out.println(t1.equals(t2)); // false
System.out.println(t1.hashCode() == t2.hashCode()); // true
Are we still respecting hashCode's contract?
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
Well, yes, since it only depends on a and b and we're using their hashCode method which we can assume respects the contract.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It does
Test t1 = new Test("hello", "world");
Test t2 = new Test("hello", "world");
System.out.println(t1.equals(t2)); // true
System.out.println(t1.hashCode() == t2.hashCode()); // true
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hash tables.
That's what we were trying to demonstrate in the first place. It's not a requirement.
public int hashCode() {
return 27;
}
Believe it or not, but this is, although it isn't a very efficient way of working, a valid implementation of hashCode, since it will respect the contract with the equals method. This implementation will cause exactly what you describe.
The hashCode is used to limit the number of cases to compare to.
For instance, if in a high school, you are looking for a student. You only know the name, gender and age of this student. Are you going to look through all the students, or only the ones with that age and gender?
The hashCode does the same, for certain data structures. When looking for an item, it will first make a sub-list/collection of the items with an identical hashCode, and then, it searches for the exact item within that sub-list/collection.
The more specific your hash code, the more efficient the search.

Hashcode returning same values for different references [duplicate]

This question already has answers here:
Why can hashCode() return the same value for different objects in Java?
(6 answers)
Closed 7 years ago.
I was thinking hashcodes are only implemented in HashMap, Hashtable. In my understanding hashCode value will be same for both on object level also.
Hence
String str="Niks";
String str1=new String("Niks");
System.out.println(str.hashCode());
System.out.println(str1.hashCode());
The same has codes are returned because on object level the hashcode will be implemented as below. Correct me if i am wrong.
result = prime * result + ((str == null) ? 0 : str.hashCode());
result2 = prime * result + ((str1 == null) ? 0 : str1.hashCode());
Output:
75268767
75268767
Generally if I'm using strings as keys in a hashmap, I don't want to have to worry about whether the string I'm using as a key for a lookup is exactly the same reference as what I may have inserted previously, that would complicate things immensely. I want to be able to create a string as a key and know that, if the map has a string with the same value as what I'm using, that the map will find it. So comparing references is not what I want, I want to compare by value.
For objects like Strings or numbers (java.lang.Integer, java.math.BigInteger, java.math.BigDecimal), typically comparing references is not useful; all people are interested in is the value. These are called value objects, and equality and hashCode are based strictly on the object's value, not on comparing references.
From a definition posted by Martin Fowler:
In P of EAA I described Value Object as a small object such as a Money or date range object. Their key property is that they follow value semantics rather than reference semantics.
You can usually tell them because their notion of equality isn't based on identity, instead two value objects are equal if all their fields are equal. Although all fields are equal, you don't need to compare all fields if a subset is unique - for example currency codes for currency objects are enough to test equality.
A general heuristic is that value objects should be entirely immutable. If you want to change a value object you should replace the object with a new one and not be allowed to update the values of the value object itself - updatable value objects lead to aliasing problems.
The hashCode() method of objects is used when you insert them into a HashTable, HashMap or HashSet. Just as important is the equals() method.
If two objects are equal, like you have demonstrated in your example (Two strings with the value 'Niks') (see this answer for more information) then they will have the same Hash, an important point to note is that just because two Objects have the same hash, they may not be equal!

Setting key type in HashMap, how?

hi
I want to create a HashMap (java) that stores Expression, a little object i've created.
How do I choose what type of key to use? What's the difference for me between integer and String? I guess i just don't fully understand the idea behind HashMap so i'm not sure what keys to use.
Thanks!
Java HashMap relies on two things:
the hashCode() method, which returns an integer that is generated from the key and used inside the map
the equals(..) method, which should be consistent to the hash calculated, this means that if two keys has the same hashcode than it is desiderable that they are the same element.
The specific requirements, taken from Java API doc are the following:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
If you don't provide any kind of specific implementation, then the memory reference of the object is used as the hashcode. This is usually good in most situations but if you have for example:
Expression e1 = new Expression(2,4,PLUS);
Expression e2 = new Expression(2,4,PLUS);
(I don't actually know what you need to place inside your hashmap so I'm just guessing)
Then, since they are two different object although with same parameters, they will have different hashcodes. This could be or not be a problem for your specific situation.
In case it isn't just use the hasmap without caring about these details, if it is you will need to provide a better way to compute the hashcode and equality of your Expression class.
You could do it in a recursive way (by computing the hashcode as a result of the hashcodes of children) or in a naive way (maybe computing the hashcode over a toString() representation).
Finally, if you are planning to use just simple types as keys (like you said integers or strings) just don't worry, there's no difference. In both cases two different items will have the same hashcode. Some examples:
assert(new String("hello").hashCode() == new String("hello").hashCode());
int x = 123;
assert(new Integer(x).hashCode() == new Integer(123).hashCode());
Mind that the example with strings is not true in general, like I explained you before, it is just because the hashcode method of strings computes the value according to the content of the string itself.
The key is what you use to identify objects. You might have a situation where you want to identify numbers by their name.
Map<String,Integer> numbersByName = new HashMap<String,Integer>();
numbersByName.put("one",Integer.valueOf(1));
numbersByName.put("two",Integer.valueOf(2));
numbersByName.put("three",Integer.valueOf(3));
... etc
Then later you can get them out by doing
Integer three = numbersByName.get("three");
Or you might have a need to go the other way. If you know you're going to have integer values, and want the names, you can map integers to strings
Map<String,Integer> numbersByValue = new HashMap<String,Integer>();
numbersByValue.put(Integer.valueOf(1),"one");
numbersByValue.put(Integer.valueOf(2),"two");
numbersByValue.put(Integer.valueOf(3),"three");
... etc
And get it out
String three = numbersByValue.get(Integer.valueOf(3));
Keys and their associated values are both objects. When you get something from a HashMap, you have to cast it to the actual type of object it represents (we can do this because all objects in Java inherit the Object class). So, if your keys are strings and your values are Integers, you would do something like:
Integer myValue = (Integer)myMap.get("myKey");
However, you can use Java generics to tell the compiler that you're only going to be using Strings and Integers:
HashMap<String,Integer> myMap = new HashMap<String,Integer>();
See http://download.oracle.com/javase/1.4.2/docs/api/java/util/HashMap.html for more details on HashMap.
If you do not want to look up the expressions, why do you want them to store in a map?
But if you want to, then the key is that item you use for lookup.

How to ensure hashCode() is consistent with equals()?

When overriding the equals() function of java.lang.Object, the javadocs suggest that,
it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
The hashCode() method must return a unique integer for each object (this is easy to do when comparing objects based on memory location, simply return the unique integer address of the object)
How should a hashCode() method be overriden so that it returns a unique integer for each object based only on that object's properities?
public class People{
public String name;
public int age;
public int hashCode(){
// How to get a unique integer based on name and age?
}
}
/*******************************/
public class App{
public static void main( String args[] ){
People mike = new People();
People melissa = new People();
mike.name = "mike";
mike.age = 23;
melissa.name = "melissa";
melissa.age = 24;
System.out.println( mike.hasCode() ); // output?
System.out.println( melissa.hashCode(); // output?
}
}
It doesn't say the hashcode for an object has to be completely unique, only that the hashcode for two equal objects returns the same hashcode. It's entirely legal to have two non-equal objects return the same hashcode. However, the more unique a hashcode distribution is over a set of objects, the better performance you'll get out of HashMaps and other operations that use the hashCode.
IDEs such as IntelliJ Idea have built-in generators for equals and hashCode that generally do a pretty good job at coming up with "good enough" code for most objects (and probably better than some hand-crafted overly-clever hash functions).
For example, here's a hashCode function that Idea generates for your People class:
public int hashCode() {
int result = name != null ? name.hashCode() : 0;
result = 31 * result + age;
return result;
}
I won't go in to the details of hashCode uniqueness as Marc has already addressed it. For your People class, you first need to decide what equality of a person means. Maybe equality is based solely on their name, maybe it's based on name and age. It will be domain specific. Let's say equality is based on name and age. Your overridden equals would look like
public boolean equals(Object obj) {
if (this==obj) return true;
if (obj==null) return false;
if (!(getClass().equals(obj.getClass())) return false;
Person other = (Person)obj;
return (name==null ? other.name==null : name.equals(other.name)) &&
age==other.age;
}
Any time you override equals you must override hashCode. Furthermore, hashCode can't use any more fields in its computation than equals did. Most of the time you must add or exclusive-or the hash code of the various fields (hashCode should be fast to compute). So a valid hashCode method might look like:
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age;
}
Note that the following is not valid as it uses a field that equals didn't (height). In this case two "equals" objects could have a different hash code.
public int hashCode() {
return (name==null ? 17 : name.hashCode()) ^ age ^ height;
}
Also, it's perfectly valid for two non-equals objects to have the same hash code:
public int hashCode() {
return age;
}
In this case Jane age 30 is not equal to Bob age 30, yet both their hash codes are 30. While valid this is undesirable for performance in hash-based collections.
Another question asks if there are some basic low-level things that all programmers should know, and I think hash lookups are one of those. So here goes.
A hash table (note that I'm not using an actual classname) is basically an array of linked lists. To find something in the table, you first compute the hashcode of that something, then mod it by the size of the table. This is an index into the array, and you get a linked list at that index. You then traverse the list until you find your object.
Since array retrieval is O(1), and linked list traversal is O(n), you want a hash function that creates as random a distribution as possible, so that objects will be hashed to different lists. Every object could return the value 0 as its hashcode, and a hash table would still work, but it would essentially be a long linked-list at element 0 of the array.
You also generally want the array to be large, which increases the chances that the object will be in a list of length 1. The Java HashMap, for example, increases the size of the array when the number of entries in the map is > 75% of the size of the array. There's a tradeoff here: you can have a huge array with very few entries and waste memory, or a smaller array where each element in the array is a list with > 1 entries, and waste time traversing. A perfect hash would assign each object to a unique location in the array, with no wasted space.
The term "perfect hash" is a real term, and in some cases you can create a hash function that provides a unique number for each object. This is only possible when you know the set of all possible values. In the general case, you can't achieve this, and there will be some values that return the same hashcode. This is simple mathematics: if you have a string that's more than 4 bytes long, you can't create a unique 4-byte hashcode.
One interesting tidbit: hash arrays are generally sized based on prime numbers, to give the best chance for random allocation when you mod the results, regardless of how random the hashcodes really are.
Edit based on comments:
1) A linked list is not the only way to represent the objects that have the same hashcode, although that is the method used by the JDK 1.5 HashMap. Although less memory-efficient than a simple array, it does arguably create less churn when rehashing (because the entries can be unlinked from one bucket and relinked to another).
2) As of JDK 1.4, the HashMap class uses an array sized as a power of 2; prior to that it used 2^N+1, which I believe is prime for N <= 32. This does not speed up array indexing per se, but does allow the array index to be computed with a bitwise AND rather than a division, as noted by Neil Coffey. Personally, I'd question this as premature optimization, but given the list of authors on HashMap, I'll assume there is some real benefit.
In general the hash code cannot be unique, as there are more values than possible hash codes (integers).
A good hash code distributes the values well over the integers.
A bad one could always give the same value and still be logically correct, it would just lead to unacceptably inefficient hash tables.
Equal values must have the same hash value for hash tables to work correctly.
Otherwise you could add a key to a hash table, then try to look it up via an equal value with a different hash code and not find it.
Or you could put an equal value with a different hash code and have two equal values at different places in the hash table.
In practice you usually select a subset of the fields to be taken into account in both the hashCode() and the equals() method.
I think you misunderstood it. The hashcode does not have to be unique to each object (after all, it is a hash code) though you obviously don't want it to be identical for all objects. You do, however, need it to be identical to all objects that are equal, otherwise things like the standard collections would not work (e.g., you'd look up something in the hash set but would not find it).
For straightforward attributes, some IDEs have hashcode function builders.
If you don't use IDEs, consider using Apahce Commons and the class HashCodeBuilder
The only contractual obligation for hashCode is for it to be consistent. The fields used in creating the hashCode value must be the same or a subset of the fields used in the equals method. This means returning 0 for all values is valid, although not efficient.
One can check if hashCode is consistent via a unit test. I written an abstract class called EqualityTestCase, which does a handful of hashCode checks. One simply has to extend the test case and implement two or three factory methods. The test does a very crude job of testing if the hashCode is efficient.
This is what documentation tells us as for hash code method
# javadoc
Whenever it is invoked on
the same object more than once during
an execution of a Java application,
the hashCode method must consistently
return the same integer, provided no
information used in equals comparisons
on the object is modified. This
integer need not remain consistent
from one execution of an application
to another execution of the same
application.
There is a notion of business key, which determines uniqueness of separate instances of the same type. Each specific type (class) that models a separate entity from the target domain (e.g. vehicle in a fleet system) should have a business key, which is represented by one or more class fields. Methods equals() and hasCode() should both be implemented using the fields, which make up a business key. This ensures that both methods consistent with each other.

Categories

Resources