Java HashMap or IdentityHashMap - java

There are some cases that the key objects used in map do not override hashCode() and equals() from Object, for examples, use a socket Connection or java.lang.Class as keys.
Is there any potential defect to use these objects as keys in a HashMap?
Should I use IdentityHashMap in these cases?

If equals() and hashCode() are not overridden on key objects, HashMap and IdentityHashMap should have the same semantics. The default equals() implementation uses reference semantics, and the default hashCode() is the system identity hash code of the object.
This is only harmful in cases where different instances of an object can be considered logically equal. For example, you would not want to use IdentityHashMap if your keys were:
new Integer(1)
and
new Integer(1)
Since these are technically different instances of the Integer class. (You should really be using Integer.valueOf(1), but that's getting off-topic.)
Class as keys should be okay, except in very special circumstances (for example, the hibernate ORM library generates subclasses of your classes at runtime in order to implement a proxy.) As a developer I would be skeptical of code which stores Connection objects in a Map as keys (maybe you should be using a connection pool, if you are managing database connections?). Whether or not they will work depends on the implementation (since Connection is just an interface).
Also, it's important to note that HashMap expects the equals() and hashCode() determination to remain constant. In particular, if you implement some custom hashCode() which uses mutable fields on the key object, changing a key field may make the key get 'lost' in the wrong hashtable bucket of the HashMap. In these cases, you may be able to use IdentityHashMap (depending on the object and your particular use case), or you might just need a different equals()/hashCode() implementation.

From a mobile code security point of view, there are situations where using IdentityHashMap or similar becomes necessary. Malicious implementations of non-final key classes can override hashCode and equals to be malicious. They can, for instance, claim equality to different instances, acquire a reference to other instances they are compared to, etc. I suggest breaking with standard practice by staying safe and using IdentityHashMap where you want identity semantics. There rarely is a good reason to changing the meaning of equality in a subclass where the superclass is already being compared. I guess the most likely scenario is a broken, non-symmetric proxy.
The implementation of IdentityHashMap is quite different than HashMap. It uses linear probing rather than Entry objects as links in a chain. This leads to a slight reduction in the number of objects, although a pretty small difference in total memory use. I don't have any good performance statistics I can cite. There used to be a performance difference between using (non-overridden) Object.hashCode and System.identityHashCode, but that got cleared up a few years ago.

In situation you describes, the behaviors of HashMap and IdentityHashMap is identical.
On the contrast to this, if keys overrides equals() and hashCode(), behaviors of two maps are different.
see java.util.IdentityHashMap's javadoc below.
This class implements the Map interface with a hash table, using reference-equality in place of object-equality when comparing keys (and values). In other words, in an IdentityHashMap, two keys k1 and k2 are considered equal if and only if (k1==k2). (In normal Map implementations (like HashMap) two keys k1 and k2 are considered equal if and only if (k1==null ? k2==null : k1.equals(k2)).)
In summary, my answer is that:
Is there any potential defect to use these objects as keys in a HashMap?
--> No
Should I use IdentityHashMap in these cases? --> No

While there's no theoretical problem, you should avoid IdentityHashMap unless you have an explicit reason to use it. It provides no appreciable performance or other benefit in the general case, and when you inevitably start introducing objects into the map that do override equals() and hashCode(), you'll end up with subtle, hard-to-diagnose bugs.
If you think you need IdentityHashMap for performance reasons, use a profiler to confirm that suspicion before you make the switch. My guess is you'll find plenty of other opportunities for optimization that are both safer and make a bigger difference.

As far as I know, the only problem with a hashmap with bad keys, would be with very big hashmaps- your keys could be very bad and you get o(n) retrieval time, instead of o(1). If it does break anything else, I would be interested to hear about it though :)

Related

When is hashcode useful if I am never using hashtable?

Lets say I am implementing a class called Car, with 2 member variables int numDoors, and String color.
In a hypothetical case, I am never going to use such a car in hashtable or hashmap or any structure that needs a hash, time immemorial.
Now, why is it still required to override hashCode along with equals ?
Note: all answers I checkout include use in hashtable / hashmap. I have tried extensively to get this answer, so as a request dont mark it as a duplicate. Thanks
It's the general convention:
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
However, it's not entirely enforceable.
There are times in which you would believe that you don't need to have hashCode defined and implemented for your object, and if you don't use any structure that relies on a hash to store or reference it, you'd be correct.
But, there are third-party libraries in which your object may come into contact with, and they may very well be using a Map or Set to do their work, and they'd have the expectation that you followed conventions.
It's up to you to not implement hashCode along with equals - you're certainly not forced to (although many would argue that this is a bug), but beware that your object may not work as well with a third party library for this reason.
The only conceivable types which would not be able to override hashCode method in a fashion consistent with the hashCode and equals contract would be those which are unable to override hashCode [e.g. because a base class declared it final]. There is thus almost never any reason for a type not to legitimately implement hashCode(). Even if a type cannot guarantee that instances which are unequal won't spontaneously become equal, the author of the type may still legitimately implement hashCode() by picking a 32-big int value [e.g. 8675309] and implementing hashCode() as #override int hashCode() { return 8675309; }. Doing this will allow all of the hash-table-based collection types to work correctly. Storing very many such items into a hash table will severely degrade performance, but hash tables with just a few items will work just fine and generally perform decently. By contrast, if one doesn't override hashCode then even a hash table will likely work incorrectly if even a single item is stored into it.
Incidentally, in some cases there may be advantages to implementing hashCode even when not using hashed collections. For example, some immutable collection types which support deep comparison might call hashCode() on the items stored therein. If a collection is large, and/or comparison operations on the items stored within it are expensive, the efficiency of testing two collections for equality ("do they contain equal items") may be enhanced by using a three-step process:
Compare the aggregate hashcode of two collections. If they're not equal, no reason to look any further. Will often yield instant results, no matter the size of the collections.
Compare the cached hash codes of all the items. If the collections' contents match except for the last couple items, and if comparisons between items may be expensive (e.g. the items are thousand-character strings) this will often avoid the need to compare all but one of the items for equality [note that if all but one of the items matched, and its hash code differed, then the aggregate hash code would differ and we wouldn't have gotten this far].
If all the hash codes match, then call equals on each pair of items that don't compare reference-equal.
Note that if two collections contain distinct items with equal content, a comparison is going to need to deeply examine all of the items; hashCode can't do anything to help with that case. On the other hand, in most cases where things are compared they are not equal, and using cached hashCode() values may facilitate orders-of-magnitude speedups in those cases.

Why there is no Hashable interface in Java

Object in Java has hashCode method, however, it is being used only in associative containers like HashSet or HashMap. Why was it designed like that? Hashable interface having hashCode method looks as much more elegant solution.
The major argument seems to me that there is a well-defined default hashCode that can be calculated for any Java object, along with an equally well-defined equals. There is simply no good reason to withhold this function from all objects, and of course there are plenty reasons not to withhold it. So it's a no-brainer in my book.
This question is claimed as a duplicate from another which asks why there's no interface which behaves like Comparator (as distinct from Comparable) but for hashing. .NET includes such an interface, called IEqualityComparer, and it would seem like Java could as well. As it is, if someone wants to e.g. have a Java collection which e.g. maps strings to other objects in case-insensitive fashion (perhaps the most common use of IEqualityComparer) one must wrap the strings in objects whose hashCode and equals methods act on a case-insensitive basis.
I suspect the big issue is that while an "equalityComparer" interface could be convenient, in many cases efficiently testing an equivalence relation would require caching information. For example, while a case-insensitive string-hashing function could make an uppercase-only copy of the passed-in string and call hashCode on that, it would be difficult to avoid having every request for the hashcode of a particular string repeat the conversion to uppercase and the hashing of that uppercase value. By contrast, a "case-insensitive string" object could include fields for an uppercase-only copy of the string, which would then only have to be generated once for the instance.
An EqualityComparer could achieve reasonable performance if it included something like a WeakHashMap<string,string> to convert raw strings to uppercase-only strings, but such a design would either require different threads to use different EqualityComparer instances despite the lack of externally visible state, or else require performance-robbing locking and synchronization code even in single-threaded scenarios.
Incidentally, a second issue that arises with comparator-style interfaces is that a collection type which uses an externally-supplied comparator (whether it compares for rank or equality) is that the comparator itself becomes part of the state of the class which uses it. If hash tables use different EqualityComparer instances, there may be no way to know that they can safely be considered equivalent, even if the two comparators would behave identically in all circumstances.

Is Java hashCode() method a reliable measure of object equality?

I am currently working on comparing two complex objects of the same type, with multiple fields consisting of data structures of custom object types. Assuming that none of the custom objects has overriden the hashCode() method, if I compare the hashcodes of every field in the objects, and they will turn out to be the same, do I have a 100% confidence that the content of the compared objects is the same? If not, which method would you recommend to compare two objects, assuming I can't use any external libraries.
Absolutely not. You should only use hashCode() as a first pass - if the hash codes are different, you can assume the objects are unequal. If the hash codes are the same, you should then call equals() to check for full equality.
Think about it this way: there are only 232 possible hash codes. How many possible different objects are there of type String, as an example? Far more than that. Therefore at least two non-equal strings must share the same hash code.
Eric Lippert writes well about hash codes - admittedly from a .NET viewpoint, but the principles are the same.
No, lack of hashCode() collision only means that the objects could be identical, it's never a guarantee.
The only guarantee is that if the hashCode() values are different (and the hashCode()/equals() implementations are correct), then the objects will not be equal.
Additionally if your custom types don't have a hashCode() implementation, then that value is entirely useless for comparing the content of the object, because it will be the identityHashCode().
If you haven't overriden the hashCode() method, all of your objects are unequal. By overriding it you provide the logic of the comparison. Remember, if you override hashCode(), you definitely should override equals().
EDIT:
there still can be a collisionm of course, but if you didn't override equal(), your objects will be compared by reference (an object is equal to itself).
The usual JVM implementation of Object.hashCode() is to return the memory address of the object in some format, so this would technically be used for what you want (as no two objects can share the same address).
However, the actual specification of Object.hashCode() makes no guarentees and should not be used for this purpose in any sensible or well-written piece of code.
I would suggest using the hashCode and equals builders available in the Apache commons library, or if you really can't use free external libs, just have a look at them for inspiration. The best method to use depends entirely on what "equal" actually means in the context of your application domain though.

Can hashCode() have dynamically changeable content?

In my implementation, I have a class A which overrides equals(Object) and hashCode(). But I have a small doubt that is, while adding the instance of A to HashSet/HashMap the value of the hashCode() is x, after sometime the value of the same hashCode() changed to y. Will it effect anything?
The hash code mustn't change after it's been added to a map / set. It's okay for it to change before that, although it generally makes the type easier to work with if it doesn't change.
If the hash code changes, the key won't be found in the map / set, as even if it ends up in the same bucket, the hash code will be changed first.
When the return value of hashCode() or equals() changes while the object is contained in HashMap/HashSet etc., the behavior is undefined (you could get all kinds of strange behavior). So one must avoid such mutation of keys while the object is contained in such collections etc.
It is considered best to use only immutable objects for keys (or place them in a HashSet etc.). In fact python for example, does not allow mutable objects to be used as keys in maps. It is permissive/common to use mutable objects as keys in Java, but in such case it is advisable to make such objects "effectively immutable". I.e. do not change the state of such objects at all after instantiation.
To give an example, using a list as a key in a Map is usually considered okay, but you should avoid mutating such lists at any point of your application to avoid getting bitten by nasty bugs.
As long as you don't change the return value of hashCode() and equals() while the objects are in the container, you should be ok on paper. But one could easily introduce nasty, hard to find bugs by mistake so it's better to avoid the situation altogether.
Yes, the hash code of an object must not change during its lifetime. If it does, you need to notify the container (if that's possible); otherwise you will can get wrong results.
Edit: As pointed out, it depends on the container. Obviously, if the container never uses your hashCode or equals methods, nothing will go wrong. But as soon as it tries to compare things for equality (all maps and sets), you'll get yourself in trouble.
Yes. Many people answered the question here, I just want to say an analogy. Hash code is something like address in hash-based collection:
Imagine you check in a hotel by your name "Mike", after that you change your name to "GreatMike" on check-paper. Then when someone looks for you by your name "Mike", he cannot find you anymore.

Use cases for IdentityHashMap

Could anyone please tell what are the important use cases of IdentityHashMap?
Whenever you want your keys not to be compared by equals but by == you would use an IdentityHashMap. This can be very useful if you're doing a lot of reference-handling but it's limited to very special cases only.
The documentations says:
A typical use of this class is
topology-preserving object graph
transformations, such as serialization
or deep-copying. To perform such a
transformation, a program must
maintain a "node table" that keeps
track of all the object references
that have already been processed. The
node table must not equate distinct
objects even if they happen to be
equal. Another typical use of this
class is to maintain proxy objects.
For example, a debugging facility
might wish to maintain a proxy object
for each object in the program being
debugged.
One case where you can use IdentityHashMap is if your keys are Class objects. This is about 33% faster than HashMap for gets! It probably uses less memory too.
You can also use the IdentityHashMap as a general purpose map if you can make sure the objects you use as keys will be equal if and only if their references are equal.
To what gain? Obviously it will be faster and will use less memory than using implementations like HashMap or TreeMap.
Actually, there are quite a lot of cases when this stands. For example:
Enums. Although for enums there is even a better alternative: EnumMap
Class objects. They are also comparable by reference.
Interned Strings. Either by specifying them as literals or calling String.intern() on them.
Cached instances. Some classes provide caching of their instances. For example quoting from the javadoc of Integer.valueOf(int):
This method will always cache values in the range -128 to 127, inclusive...
Certain libraries/frameworks will manage exactly one instance of ceratin types, for example Spring beans.
Singleton types. If you use istances of types that are built with the Singleton pattern, you can also be sure that (at the most) one instance exists from them and therefore reference equality test will qualify for equality test.
Any other type where you explicitly take care of using only the same references for accessing values that were used to putting values into the map.
To demonstrate the last point:
Map<Object, String> m = new IdentityHashMap<>();
// Any keys, we keep their references
Object[] keys = { "strkey", new Object(), new Integer(1234567) };
for (int i = 0; i < keys.length; i++)
m.put(keys[i], "Key #" + i);
// We query values from map by the same references:
for (Object key : keys)
System.out.println(key + ": " + m.get(key));
Output will be, as expected (because we used the same Object references to query values from the map):
strkey: Key #0
java.lang.Object#1c29bfd: Key #1
1234567: Key #2
HashMap creates Entry objects every time you add an object, which can put a lot of stress on the GC when you've got lots of objects. In a HashMap with 1,000 objects or more, you'll end up using a good portion of your CPU just having the GC clean up entries (in situations like pathfinding or other one-shot collections that are created and then cleaned up). IdentityHashMap doesn't have this problem, so will end up being significantly faster.
See a benchmark here: http://www.javagaming.org/index.php/topic,21395.0/topicseen.html
This is a practical experience from me:
IdentityHashMap leaves a much smaller memory footprint compared to HashMap for large cardinalities.
One important case is where you are dealing with reference types (as opposed to values) and you really want the correct result. Malicious objects can have overridden hashCode and equals methods getting up to all sorts of mischief. Unfortunately, it's not used as often as it should be. If the interface types you are dealing with don't override hashCode and equals, you should typically go for IdentityHashMap.

Categories

Resources