Why does Object.hashcode() have conflicts in Java? - java

I run the code below in Hotspot JDK 1.6 on Windows XP,
I ran it twice and I got the results below.
So basically it seems the object.hashcode() also have conflicts?
it looks like it's not returning the memory address in the VM.
However, a comment in the JDK said the values should be distinct, can anyone explain?
As much as is reasonably practical, the hashCode method defined by
class Object does return distinct integers for distinct
objects. (This is typically implemented by converting the internal
address of the object into an integer, but this implementation
technique is not required by the
JavaTM programming language.)
#return a hash code value for this object.
#see java.lang.Object#equals(java.lang.Object)
#see java.util.Hashtable
This is the first result:
i,hashcode(): 361,9578500
i,hashcode(): 1886,9578500
conflict:1886, 361
i,hashcode(): 1905,14850080
i,hashcode(): 2185,14850080
conflict:2185, 1905
9998
This is the 2nd result:
i,hashcode(): 361,5462872
i,hashcode(): 1886,29705835
conflict:1887, 362
i,hashcode(): 1905,9949222
i,hashcode(): 2185,2081190
conflict:2186, 1906
9998
10000
My code:
#Test
public void testAddr()
{
Set<Integer> s = new TreeSet<Integer>();
Map<Integer, Integer> m = new TreeMap<Integer, Integer>();
Set<Object> os = new HashSet<Object>();
for(int i = 0; i < 10000; ++i)
{
Object o = new Object();
os.add(o);
Integer h = o.hashCode();
if((i == 361) || (i == 1886) || (i == 2185) || (i == 1905))
{
System.out.println("i,hashcode(): " + i + "," + h);
}
if(s.contains(h))
{
System.out.println("conflict:" + i + ", " + m.get(h));
}
else
{
s.add(h);
m.put(h, i);
}
}
System.out.println(s.size());
int c = 0;
for(Object o: os)
{
c++;
}
System.out.println(c);
}

hashCode() is supposed to be used for placing objects in hash tables. The rule for hashCode is not that hashCode should never generate conflicts, although that is a desirable property, but that equal objects must have equal hash codes. This does not preclude non-equal objects from having equal hash codes.
You have found a case where the default Object.hashCode() implementation does generate equal hash codes for non-equal objects. It is required that the hash code of an object not change unless there is a change to some field affection equality of that object with another. One possible cause is that the garbage collector rearranged memory so that a later instantiation of o was at the same location as an earlier instantiation of o (that is, you allocated two objects o in the loop, and the garbage collector rearranged memory in between the two allocations so that the old o was moved out of one location of memory and the new o was then allocated at that location). Then, even though the hash code for the old o cannot change, the hash code for the new o is the address where the new o is stored in memory, which happens to be equal to the hash code for the old o.

It's an unfortunately common misinterpretation of the API docs. From a still-unfixed (1 vote) bug for this some time ago.
(spec) System.identityHashCode doc inadequate, Object.hashCode default
implementation docs mislead
[...]
From Usenet discussions and Open Source Software it appears that
many, perhaps majority, of programmers take this to mean that the
default implementation, and hence System.identityHashCode, will
produce unique hashcodes.
The suggested implementation technique is not even appropriate to
modern handleless JVMs, and should go the same way as JVM Spec Chapter
9.
The qualification "As much as is reasonably practical," is, in
practice, insufficient to make clear that hashcodes are not, in
practice, distinct.

It is possible that a long-running program may create, call hashCode() upon, and abandon many billions of objects during the time that it runs. Thus, it would be mathematically impossible to ensure that ensuring that once some object hashCode returned a particular number, no other object would ever return that same number for the life of the program. Even if hashCode() somehow managed to return unique values for the first 4,294,967,296 objects, it would have no choice but to return an already-used value for the next one (since the previous call would have used the last remaining formerly-unused value).
The fact that hashCode() clearly cannot guarantee that hash values won't get reused for the life of the program does not mean it couldn't guarantee that hash codes won't get reused during the lifetime of the objects in question. Indeed, for some memory-management schemes, such a guarantee could be made relatively cheaply. For example, the 1984 Macintosh split the heap into two parts, one of which held fixed-sized object descriptors, and one of which held variable-sized object data. The object descriptors, once created, would never move; if any objects were deleted, the space used by their descriptors would get reused when new objects were created. Under such a scheme, the address of an object descriptor would represent a unique and unchanging representation of its identity for as long as the object existed, and could thus be used as a hashCode() value. Unfortunately, such schemes tend to have more overhead than some other approaches in which objects have no fixed address associated with them.

The comment does not say that it is distinct.
It says that it is distinct as much as is reasonably practical.
Apparently, you found a case where it wasn't practical.

Hashcodes do not have to be unique just consistent. Although more often than not they are typically fairly unique.
In addition to your excerpt above Object has the following to say.
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
Object Doc

Related

Can I use identityHashCode to produce a compareTo between Objects respecting same-ness?

I want to implement a simple comparator between two Objects, whose only requirements are that
it is a valid comparator (i.e. defines a linear order on all objects) and
.compare will return 0 if and only if the objects are the same.
Will Comparator.comparing(System::identityHashCode) work? Is there another way?
Motivation:
I want to build a collection that will allow me to store time-stamped messages in a thread-safe collection, which will support queries like "get me all the messages whose timestamp lies in [a,b)".
It seems that Guava's TreeMultimap uses a global lock (edit: if wrapped with the synchronizedSortedSetMultimap wrapper), and ConcurrentSkipListMap seems to support only one entry per time (it is a map, not a multi map). So I thought of using just a set of pairs:
ConcurrentSkipListSet<ImmutablePair<Float,Message>> db,
where the pairs are lexically ordered, first by the times (using Float.compareTo) and then by something like Comparator.nullsFirst(Comparator.comparing(System::identityHashCode)).
The nullsFirst is there just so db.subSet(ImmutablePair.of(a,null), ImmutablePair.of(b,null)) queries the half-open time interval [a,b).
You see why I care about the comparator preserving sameness: if the message comparator returns zero for non-same messages, messages may be deleted.
You also see why I don't need much else from the comparator: it's just there so I can use the storage mechanism of ConcurrentSkipListSet. I certainly don't want to impose on the user (well, just me :-) to implement a comparator for Message.
Another possible solution is to use a ConcurrentSkipListMap<Float, Set<Message>> (with thread-safe Set<> instances) but it seems a bit wasteful in terms of memory, and I will need to remove emptySet's myself to save memory once messages are deleted.
EDIT: As several people noted, identityHashCode may produce collisions, and in fact I've now confirmed that such collisions exist in my setup (which is roughly equivalent to having 4K collections as above, each populated with 4K messages per time bin). This is most likely the reason I see some messages dropped. So I'm now even more interested than ever in finding some way to have an "agnostic" comparison operator, that truly respects sameness. Actually, a 64 bit hash value (instead of the 32bit value provided by identityHashCode) would probably suffice.
While it's not guaranteed, I suspect the chances of this causing a problem are vanishingly small.
System.identityHashCode returns the value that Object.hashCode would return if not overridden, including this in the documentation:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects.
So is "as much as is reasonably practical" sufficient? While it's not guaranteed, I would be very surprised if you ever ran into a situation where it causes a problem. You'd have to have two messages with exactly the same timestamp and where the JVM's Object.hashCode implementation returns the same value for the two messages.
If the result of that coincidence were to be "nuclear power plant explodes" then I wouldn't risk it. If the result of that coincidence were to be "we fail to bill a customer" - or even "we bill a customer twice, and might get sued" I'd probably accept that chance, if no better alternatives are suggested.
As #StuartMarks noted in his comment, Guava supports
Ordering.arbitrary(), which provides thread-safe collision handling. The implementation makes an efficient use of identityHashCode:
#Override
public int compare(Object left, Object right) {
if (left == right) {
return 0;
} else if (left == null) {
return -1;
} else if (right == null) {
return 1;
}
int leftCode = identityHashCode(left);
int rightCode = identityHashCode(right);
if (leftCode != rightCode) {
return leftCode < rightCode ? -1 : 1;
}
// identityHashCode collision (rare, but not as rare as you'd think)
int result = getUid(left).compareTo(getUid(right));
if (result == 0) {
throw new AssertionError(); // extremely, extremely unlikely.
}
return result;
}
so only if there is a hash collision, getUid (which uses a memoized AtomicInteger counter to allocate uid's) is invoked.
It's also quite easy to write (perhaps less easy to read?) the desired timestamped message container in "one" line:
db = new ConcurrentSkipListSet<>(
(Ordering.<Float>natural().<ImmutablePair<Float,Message>>onResultOf(x -> x.left))
.compound(Ordering.arbitrary().nullsFirst().<ImmutablePair<Float,Message>>onResultOf(x -> x.right)))
Will Comparator.comparing(System::identityHashCode) work? Is there another way?
As mentioned, identityHashCode is not unique.
Actually, a 64 bit hash value (instead of the 32bit value provided by identityHashCode) would probably suffice
I think this would just be reducing the chances of overlap, not removing them. Hash algorithems are designed to limit overlaps but typically have no guarantees about none. For example, MD5 is 128 bit and still has overlaps.
How about just assigning a unique number to each message with AtomicLong. Then your comparison function would do:
Compare by time. I would use long if possible instead of float.
If same time then compare by the unique value.
If you have multiple systems doing the ingesting of these messages then you are going to need to record unique system-id and message number to ensure uniqueness.

Value-based Classes confusion

I'm seeking some clarification to the definition of Value-based Classes. I can't imagine, how is the last bullet point (6) supposed to work together with the first one
(1) they are final and immutable (though may contain references to mutable objects)
(6) they are freely substitutable when equal, meaning that interchanging any two instances x and y that are equal according to equals() in any computation or method invocation should produce no visible change in behavior.
Optional is such a class.
Optional a = Optional.of(new ArrayList<String>());
Optional b = Optional.of(new ArrayList<String>());
assertEquals(a, b); // passes as `equals` delegated to the lists
b.get().add("a");
// now bite the last bullet
assertTrue(a.get().isEmpty()); // passes
assertTrue(b.get().isEmpty()); // throws
Am I reading it incorrectly, or would it need to get more precise?
Update
The answer by Eran makes sense (they are no more equal), but let me move the target:
...
assertEquals(a, b); // now, they are still equal
assertEquals(m(a, b), m(a, a)); // this will throw
assertEquals(a, b); // now, they are equal, too
Let's define a funny method m, which does some mutation and undoes it again:
int m(Optional<ArrayList<String>> x, Optional<ArrayList<String>> y) {
x.get().add("");
int result = x.get().size() + y.get().size();
x.get().remove(x.get().size() - 1);
return result;
}
It's strange method, I know. But I guess, it qualifies as "any computation or method invocation", doesn't it?
they are freely substitutable when equal, meaning that interchanging any two instances x and y that are equal according to equals() in any computation or method invocation should produce no visible change in behavior
Once b.get().add("a"); is executed, a is no longer equals to b, so you have no reason to expect assertTrue(a.get().isEmpty()); and assertTrue(b.get().isEmpty()); would produce the same result.
The fact that a value based class is immutable doesn't mean you can't mutate the values stored in instances of such classes (as stated in though may contain references to mutable objects). It only means that once you create an Optional instance with Optional a = Optional.of(new ArrayList<String>()), you can't mutate a to hold a reference to a different ArrayList.
You can derive the invalidity of your actions from the specification you’re referring to:
A program may produce unpredictable results if it attempts to distinguish two references to equal values of a value-based class, whether directly via reference equality or indirectly via an appeal to synchronization, identity hashing, serialization, or any other identity-sensitive mechanism. Use of such identity-sensitive operations on instances of value-based classes may have unpredictable effects and should be avoided.
(emphasis mine)
Modifying an object is an identity-sensitive operation, as it only affects the object with the specific identity represented by the reference you are using for the modification.
When you are calling x.get().add(""); you are performing an operation that allows to recognize whether x and y represent the same instance, in other words, you are performing an identity sensitive operation.
Still, I expect that if a future JVM truly tries to substitute value based instances, it has to exclude instances referring to mutable objects, to ensure compatibility. If you perform an operation that produces an Optional followed by extracting the Optional, e.g. … stream. findAny().get(), it would be disastrous/unacceptable if the intermediate operation allowed to substitute the element with another object that happened to be equal at the point of the intermediate Optional use (if the element is not itself a value type)…
I think a more interesting example is as follows:
void foo() {
List<String> list = new ArrayList<>();
Optional<List<String>> a = Optional.of(list);
Optional<List<String>> b = Optional.of(list);
bar(a, b);
}
It's clear that a.equals(b) is true. Furthermore, since Optional is final (cannot be subclassed), immutable, and both a and b refer to the same list, a.equals(b) will always be true. (Well, almost always, subject to race conditions where another thread is modifying the list while this one is comparing them.) Thus, this seems like it would be a case where it would be possible for the JVM to substitute b for a or vice-versa.
As things stand today (Java 8 and 9 and 10) we can write a == b and the result will be false. The reason is that we know that Optional is an instance of an ordinary reference type, and the way things are currently implemented, Optional.of(x) will always return a new instance, and two new instances are never == to each other.
However, the paragraph at the bottom of the value-based classes definition says:
A program may produce unpredictable results if it attempts to distinguish two references to equal values of a value-based class, whether directly via reference equality or indirectly via an appeal to synchronization, identity hashing, serialization, or any other identity-sensitive mechanism. Use of such identity-sensitive operations on instances of value-based classes may have unpredictable effects and should be avoided.
In other words, "don't do that," or at least, don't rely on the result. The reason is that tomorrow the semantics of the == operation might change. In a hypothetical future value-typed world, == might be redefined for value types to be the same as equals, and Optional might change from being a value-based class to being a value type. If this happens, then a == b will be true instead of false.
One of the main ideas about value types is that they have no notion of identity (or perhaps their identity isn't detectable to Java programs). In such a world, how could we tell whether a and b "really" are the same or different?
Suppose we were to instrument the bar method via some means (say, a debugger) such that we can inspect the attributes of the parameter values in a way that can't be done through the programming language, such as by looking at machine addresses. Even if a == b is true (remember, in a value-typed world, == is the same as equals) we might be able to ascertain that a and b reside at different addresses in memory.
Now suppose the JIT compiler compiles foo and inlines the calls to Optional.of. Seeing that there are now two hunks of code that return two results that are always equals, the compiler eliminates one of the hunks and then uses the same result wherever a or b is used. Now, in our instrumented version of bar, we might observe that the two parameter values are the same. The JIT compiler is allowed to do this because of the sixth bullet item, which allows substitution of values that are equals.
Note that we're only able to observe this difference because we're using an extra-linguistic mechanism such as a debugger. Within the Java programming language, we can't tell the difference at all, and thus this substitution can't affect the result of any Java program. This lets the JVM choose any implementation strategy it sees fit. The JVM is free to allocate a and b on the heap, on the stack, one on each, as distinct instances, or as the same instances, as long as Java programs can't tell the difference. When the JVM is granted freedom of implementation choices, it can make programs go a lot faster.
That's the point of the sixth bullet item.
When you execute the lines:
Optional a = Optional.of(new ArrayList<String>());
Optional b = Optional.of(new ArrayList<String>());
assertEquals(a, b); // passes as `equals` delegated to the lists
In the assertEquals(a, b), according to the API :
will check if the params a and b are both Optional
Items both have no value present or,
The present values are "equal to" each other via
equals() (in your example this equals is the one from ArrayList).
So, when you change one of the ArrayList the Optional instance is pointing to, the assert will fail in the third point.
Point 6 says if a & b are equal then they can be used interchangeably i.e say if a method expects two instances of Class A and you have created a&b instances then if a & b passes point 6 you may send (a,a) or (b,b) or (a,b) all three will give the same output.

How do I prove that Object.hashCode() can produce same hash code for two different objects in Java?

Had a discussion with an interviewer regarding internal implementation of Java Hashmaps and how it would behave if we override equals() but not the HashCode() method for an Employee<Emp_ID, Emp_Name> object.
I was told that hashCode for two different objects would never be the same for the default object.hashCode() implementation, unless we overrode the hashCode() ourselves.
From what I remembered, I told him that Java Hashcode contracts says that two different objects "may" have the same hashcode() not that it "must".
According to my interviewer, the default object.hashcode() never returns the same hashcode() for two different objects, Is this true?
Is it even remotely possible to write a code that demonstrates this. From what I understand, Object.hashcode() can produce 2^30 unique values, how does one produce a collision, with such low possibility of collision to demonstrate that two different objects can get the same hashcode() with the Object classes method.
Or is he right, with the default Object.HashCode() implementation, we will never have a collision i.e two different objects can never have the same HashCode. If so, why do so many java manuals don't explicitly say so.
How can I write some code to demonstrate this? Because on demonstrating this, I can also prove that a bucket in a hashmap can contain different HashCodes(I tried to show him the debugger where the hashMap was expanded but he told me that this is just logical Implementation and not the internal algo?)
2^30 unique values sounds like a lot but the birthday problem means we don't need many objects to get a collision.
The following program works for me in about a second and gives a collision between objects 196 and 121949. I suspect it will heavily depend on your system configuration, compiler version etc.
As you can see from the implementation of the Hashable class, every one is guarenteed to be unique and yet there are still collisions.
class HashCollider
{
static class Hashable
{
private static int curr_id = 0;
public final int id;
Hashable()
{
id = curr_id++;
}
}
public static void main(String[] args)
{
final int NUM_OBJS = 200000; // birthday problem suggests
// this will be plenty
Hashable objs[] = new Hashable[NUM_OBJS];
for (int i = 0; i < NUM_OBJS; ++i) objs[i] = new Hashable();
for (int i = 0; i < NUM_OBJS; ++i)
{
for (int j = i + 1; j < NUM_OBJS; ++j)
{
if (objs[i].hashCode() == objs[j].hashCode())
{
System.out.println("Objects with IDs " + objs[i].id
+ " and " + objs[j].id + " collided.");
System.exit(0);
}
}
}
System.out.println("No collision");
}
}
If you have a large enough heap (assuming 64 bit address space) and objects are small enough (the smallest object size on a 64 bit JVM is 8 bytes), then you will be able to represent more than 2^32 objects that are reachable at the same time. At that point, the objects' identity hashcodes cannot be unique.
However, you don't need a monstrous heap. If you create a large enough pool of objects (e.g. in a large array) and randomly delete and recreate them, it is (I think) guaranteed that you will get a hashcode collision ... if you continue doing this long enough.
The default algorithm for hashcode in older versions of Java is based on the address of the object when hashcode is first called. If the garbage collector moves an object, and another one is created at the original address of the first one, and identityHashCode is called, then the two objects will have the same identity hashcode.
The current (Java 8) default algorithm uses a PRNG. The "birthday paradox" formula will tell you the probability that one object's identity hashcode is the same as one more of the other's.
The -XXhashCode=n option that #BastianJ mentioned has the following behavior:
hashCode == 0: Returns a freshly generated pseudo-random number
hashCode == 1: XORs the object address with a pseudo-random number that changes occasionally.
hashCode == 2: The hashCode is 1! (Hence #BastianJ's "cheat" answer.)
hashCode == 3: The hashcode is an ascending sequence number.
hashCode == 4: the bottom 32 bits of the object address
hashCode >= 5: This is the default algorithm for Java 8. It uses Marsaglia's xor-shift PRNG with a thread specific seed.
If you have downloaded the OpenJDK Java 8 source code, you will find the implementation in hotspot/src/share/vm/runtime/synchronizer.cp. Look for the get_next_hash() method.
So that is another way to prove it. Show him the source code!
Use Oracle JVM and set -XX:hashCode=2. If I remember corretly, this chooses the Default implementation to be "constant 1". Just for the purpose of proving you're right.
I have little to add to Michael's answer (+1) except a bit of code golfing and statistics.
The Wikipedia article on the Birthday problem that Michael linked to has a nice table of the number of events necessary to get a collision, with a desired probability, given a value space of a particular size. For example, Java's hashCode has 32 bits, giving a value space of 4 billion. To get a collision with a probability of 50%, about 77,000 events are necessary.
Here's a simple way to find two instances of Object that have the same hashCode:
static int findCollision() {
Map<Integer,Object> map = new HashMap<>();
Object n, o;
do {
n = new Object();
o = map.put(n.hashCode(), n);
} while (o == null);
assert n != o && n.hashCode() == o.hashCode();
return map.size() + 1;
}
This returns the number of attempts it took to get a collision. I ran this a bunch of times and generated some statistics:
System.out.println(
IntStream.generate(HashCollisions::findCollision)
.limit(1000)
.summaryStatistics());
IntSummaryStatistics{count=1000, sum=59023718, min=635, average=59023.718000, max=167347}
This seems quite in line with the numbers from the Wikipedia table. Incidentally, this took only about 10 seconds to run on my laptop, so this is far from a pathological case.
You were right in the first place, but it bears repeating: hash codes are not unique!

Why are two AtomicIntegers never equal?

I stumbled across the source of AtomicInteger and realized that
new AtomicInteger(0).equals(new AtomicInteger(0))
evaluates to false.
Why is this? Is it some "defensive" design choice related to concurrency issues? If so, what could go wrong if it was implemented differently?
(I do realize I could use get and == instead.)
This is partly because an AtomicInteger is not a general purpose replacement for an Integer.
The java.util.concurrent.atomic package summary states:
Atomic classes are not general purpose replacements for
java.lang.Integer and related classes. They do not define methods
such as hashCode and compareTo. (Because atomic variables are
expected to be mutated, they are poor choices for hash table keys.)
hashCode is not implemented, and so is the case with equals. This is in part due to a far larger rationale that is discussed in the mailing list archives, on whether AtomicInteger should extend Number or not.
One of the reasons why an AtomicXXX class is not a drop-in replacement for a primitive, and that it does not implement the Comparable interface, is because it is pointless to compare two instances of an AtomicXXX class in most scenarios. If two threads could access and mutate the value of an AtomicInteger, then the comparison result is invalid before you use the result, if a thread mutates the value of an AtomicInteger. The same rationale holds good for the equals method - the result for an equality test (that depends on the value of the AtomicInteger) is only valid before a thread mutates one of the AtomicIntegers in question.
On the face of it, it seems like a simple omission but it maybe it does make some sense to actually just use the idenity equals provided by Object.equals
For instance:
AtomicInteger a = new AtomicInteger(0)
AtomicInteger b = new AtomicInteger(0)
assert a.equals(b)
seems reasonable, but b isn't really a, it is designed to be a mutable holder for a value and therefore can't really replace a in a program.
also:
assert a.equals(b)
assert a.hashCode() == b.hashCode()
should work but what if b's value changes in between.
If this is the reason it's a shame it wasn't documented in the source for AtomicInteger.
As an aside: A nice feature might also have been to allow AtomicInteger to be equal to an Integer.
AtomicInteger a = new AtomicInteger(25);
if( a.equals(25) ){
// woot
}
trouble it would mean that in order to be reflexive in this case Integer would have to accept AtomicInteger in it's equals too.
I would argue that because the point of an AtomicInteger is that operations can be done atomically, it would be be hard to ensure that the two values are compared atomically, and because AtomicIntegers are generally counters, you'd get some odd behaviour.
So without ensuring that the equals method is synchronised you wouldn't be sure that the value of the atomic integer hasn't changed by the time equals returns. However, as the whole point of an atomic integer is not to use synchronisation, you'd end up with little benefit.
I suspect that comparing the values is a no-go since there's no way to do it atomically in a portable fashion (without locks, that is).
And if there's no atomicity then the variables could compare equal even they never contained the same value at the same time (e.g. if a changed from 0 to 1 at exactly the same time as b changed from 1 to 0).
AtomicInteger inherits from Object and not Integer, and it uses standard reference equality check.
If you google you will find this discussion of this exact case.
Imagine if equals was overriden and you put it in a HashMap and then you change the value. Bad things will happen:)
equals is not only used for equality but also to meet its contract with hashCode, i.e. in hash collections. The only safe approach for hash collections is for mutable object not to be dependant on their contents. i.e. for mutable keys a HashMap is the same as using an IdentityMap. This way the hashCode and whether two objects are equal does not change when the keys content changes.
So new StringBuilder().equals(new StringBuilder()) is also false.
To compare the contents of two AtomicInteger, you need ai.get() == ai2.get() or ai.intValue() == ai2.intValue()
Lets say that you had a mutable key where the hashCode and equals changed based on the contents.
static class BadKey {
int num;
#Override
public int hashCode() {
return num;
}
#Override
public boolean equals(Object obj) {
return obj instanceof BadKey && num == ((BadKey) obj).num;
}
#Override
public String toString() {
return "Bad Key "+num;
}
}
public static void main(String... args) {
Map<BadKey, Integer> map = new LinkedHashMap<BadKey, Integer>();
for(int i=0;i<10;i++) {
BadKey bk1 = new BadKey();
bk1.num = i;
map.put(bk1, i);
bk1.num = 0;
}
System.out.println(map);
}
prints
{Bad Key 0=0, Bad Key 0=1, Bad Key 0=2, Bad Key 0=3, Bad Key 0=4, Bad Key 0=5, Bad Key 0=6, Bad Key 0=7, Bad Key 0=8, Bad Key 0=9}
As you can see we now have 10 keys, all equal and with the same hashCode!
equals is correctly implemented: an AtomicInteger instance can only equal itself, as only that very same instance will provably store the same sequence of values over time.
Please recall that Atomic* classes act as reference types (just like java.lang.ref.*), meant to wrap an actual, "useful" value. Unlike it is the case in functional languages (see e.g. Clojure's Atom or Haskell's IORef), the distinction between references and values is rather blurry in Java (blame mutability), but it is still there.
Considering the current wrapped value of an Atomic class as the criterion for equality is quite clearly a misconception, as it would imply that new AtomicInteger(1).equals(1).
One limitation with Java is that there is no means of distinguishing a mutable-class instance which can and will be mutated, from a mutable-class instance which will never be exposed to anything that might mutate it(*). References to things of the former type should only be considered equal if they refer to the same object, while references to things of the latter type should often be considered equal if the refer to objects with equivalent state. Because Java only allows one override of the virtual equals(object) method, designers of mutable classes have to guess whether enough instances will meet the latter pattern (i.e. be held in such a way that they'll never be mutated) to justify having equals() and hashCode() behave in a fashion suitable for such usage.
In the case of something like Date, there are a lot of classes which encapsulate a reference to a Date that is never going to be modified, and which want to have their own equivalence relation incorporate the value-equivalence of the encapsulated Date. As such, it makes sense for Date to override equals and hashCode to test value equivalence. On the other hand, holding a reference to an AtomicInteger that is never going to be modified would be silly, since the whole purpose of that type centers around mutability. An AtomicInteger instance which is never going to be mutated may, for all practical purposes, simply be an Integer.
(*) Any requirement that a particular instance never mutate is only binding as long as either (1) information about its identity hash value exists somewhere, or (2) more than one reference to the object exists somewhere in the universe. If neither condition applies to the instance referred to by Foo, replacing Foo with a reference to a clone of Foo would have no observable effect. Consequently, one would be able to mutate the instance without violating a requirement that it "never mutate" by pretending to replace Foo with a clone and mutating the "clone".

Why hashCode() returns the same value for a object in all consecutive executions?

I am trying some code around object equality in java. As I have read somewhere
hashCode() is a number which is generated by applying the hash function. Hash Function can be different for each object but can also be same. At the object level, it returns the memory address of the object.
Now, I have sample program, which I run 10 times, consecutively. Every time i run the program I get the same value as hash code.
If hashCode() function returns the memory location for the object, how come the java(JVM) store the object at same memory address in the consecutive runs?
Can you please give me some insight and your view over this issue?
The Program I am running to test this behavior is below :
public class EqualityIndex {
private int index;
public EqualityIndex(int initialIndex) {
this.index = initialIndex;
}
public static void main(String[] args) {
EqualityIndex ei = new EqualityIndex(2);
System.out.println(ei.hashCode());
}
}
Every time I run this program the hash code value returned is 4072869.
how come the java(JVM) store the object at same memory address in the consecutive runs?
Why wouldn't it? Non-kernel programs never work with absolute memory addresses, they use virtual memory where each process gets its own address space. So it's no surprise at all that a deterministic program would place stuff at the same location in each run.
Well, the objects may very well end up in the same location in the virtual memory. Nothing contradictory with that. Bottom line is that you shouldn't care anyway. If the hashCode is implemented to return something related to the internal storage address (there is no guarantee at all!) there is nothing useful you could do with that information anyway.
The hashCode() function should return the same value for the same object in the same execution. It is not necessary to return same value for different execution. See the following comments which was extracted from Object class.
The general contract of hashCode is:
Whenever it is invoked on the same object more than once during
an execution of a Java application, the hashCode method
must consistently return the same integer, provided no information
used in equals comparisons on the object is modified.
This integer need not remain consistent from one execution of an
application to another execution of the same application.
I have executed your code few times.
Here is my result
1935465023
1425840452
1935465023
1935465023
1935465023
1925529038
1935465023
1935465023
The same number is repeated multiple times. But that doesn't mean they are same always. Lets assume, a particular implementation of JVM is looking for first free slot in the memory. then if you don't run any other applications, the chances for allocating the object in the same slot is high.
Which kinds of object you are created? Is it a specific kind of object like String (e.g). If it is a specific kind, the class can already override the hashCode() method and return it to you. In that case, what you receive is not memory address any more.
and also what values you receive, are you sure if it is memory address?
So, please post more code for more details
As I have read somewhere: " ... At the object level, it returns the memory address of the object."
That statement is incorrect, or at best an oversimplification.
The statement is referring to the default implementation of Object.hashCode() which returns the same value as System.identityHashcode(Object).
The Javadoc for Object.hashCode() says this:
As much as is reasonably practical, the hashCode method defined by class Object does return distinct integers for distinct objects. (This is typically implemented by converting the internal address of the object into an integer, but this implementation technique is not required by the JavaTM programming language.)
and this:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified.
In fact, the identity hashcode value is typically based on the Object's machine address when the method is first called for the object. The value is stored in a hidden field in the Object header. This allows the same hashcode to be returned on subsequent calls ... even if the GC has relocated the Object in the meantime.
Note that if the identity hashcode value did change over the lifetime of an Object, it would violate the contract for hashcode(), and would be useless as a hashcode.
hashCode can and may be overridden by the class you are using, the JavaDoc for Object.hashCode() states that it is '... typically implemented by converting the internal address of the object into an integer...' so the actual implementation may be system dependant. Note also the third point in the JavaDoc that two objects that are not equal are not required to return different hashCodes.
for (int i=0; i < 10; i++) {
System.out.println("i: " + i + ", hashCode: " + new Object().hashCode());
}
prints:
i: 0, hashCode: 1476323068
i: 1, hashCode: 535746438
i: 2, hashCode: 2038935242
i: 3, hashCode: 988057115
i: 4, hashCode: 1932373201
i: 5, hashCode: 1001195626
i: 6, hashCode: 1560511937
i: 7, hashCode: 306344348
i: 8, hashCode: 1211154977
i: 9, hashCode: 2031692173
If two objects are equal, they have to have the same hash codes. So, as long as you do not create plain Object instances, everytime you create object that has the same value, you'll get the same hash code.

Categories

Resources