Java HashMap remove item after its hash changed

Java HashMap remove item after its hash changed - java

I have a HashMap where keys are mutable complex objects - hash changes over their lifetime. I know exactly what objects are changed, but only after the fact - their removal using map.remove(object) will not work because the hash changed. Number of objects in the map is about in range [10, 10 000], the issue is rather in number of changes and accesses.
It would be demanding to do a "would you change" check on each object before changing it - double the work, not to mention the mess of a code necessary for it.
I do iterate entries in the map later on, so I figured I could simply mark objects for removal and get rid of them using iterator.remove(), but unfortunately HashMap$HashIterator#remove calls hash(key).
The one option that comes to my mind is to throw away the original map and rehash all objects that are not marked for removal into a new map, but that would generate a lot of extra time and memory garbage - would like to avoid it.
Another option would be writing my own HashMap that keeps track of where exactly is stored every element (say map formed by 2D object array = two int coordinates). This would be more efficient, but also a lot more to write and test.
Is there any easier way to do this that I have missed?
Edit:
I use wrappers over the complex object that supply different hash/equals pairs depending on subset of properties. Each object may be in multiple maps. Say I look for red object in map that uses wrappers with hash/equals over color, create red dummy object, and do map.get(dummy).
Implementations of hash/equals and specific properties they touch are not part of my code.
All maps are objects mapped onto themselves (like Set implementation, but I do need map access methods). I can store hashes in those wrappers, and then they will adhere to the contract from hash perspective, but equals will still fail me.
I do understand that by changing hash/equals output is undefined behavior, but it really should not matter in theory - I change object, and then I do not want to use the map until the changed object is gone from it. Hash map should not really need to call equals() or hash() for object it is already pointing at with iterator.

All maps are objects mapped onto themselves (like Set implementation, but I do need map access methods). I can store hashes in those wrappers, and then they will adhere to the contract from hash perspective, but equals will still fail me.
As others said, either try to find an immutable key (e.g. generated or a subset of some immutable properties) or have a look at other data structures, are some of the general recommendations witout seeing the code.
I didn't quite understand why you can "store hashes in those wrappers" but still have trouble with the equals method. (I guess the stored hashes would no be unique so they could be checked in the equals method?)
But if you have immutable hashes and if you have only one instance per "equal" object (not one instance stored in the map and another but equal instance used for lookup), you could have a look at the IdentityHashMap class.

Previous state:
User supplies equals/hash lambdas that work over complex object to place it each map in correct place (looking up objects of similar properties in constant time).
Complex object did change in inconvenient times causing issues with reinsert - object changes, pull it out, return it with new hash.
Current solution:
In theory could be solved with custom implementation of hash map (note NOT hash map interface, would not uphold its contract). This map would cache hashes for its contents for rehash purposes, and maintain coordinates in underlying structure so equals is not necessary for removal with values iterator. May implement it later to reduce memory footprint.
Used solution was forcing user to supply key that wraps all used properties and adds hash/equals that considers those properties. Now even though complex object changes, its key stays the same until prompted for update (not inside of the map at the time of the update).
public class Node {
public HashMap<Key, Node> map;
public Data<T> data;
public Key key;
public Node parent;
public void update() {
if (parent != null) parent.map.remove(key);
key.update(data);
if (parent != null) parent.map.put(key, this);
}
}
public abstract class Key {
public abstract void update(Data data);
public abstract int hashCode();
public abstract boolean equals(Object obj);
}
public class MyKey extends Key {
private Object value = null;
public final void update(Data data) {
value = data.value;
}
public final boolean equals(Object obj) {
IdentityKey that = (IdentityKey)obj;
return this.value == that.value;
}
public final int hashCode() {
return value == null ? 0 : value.hashCode();
}
}
This requires a lot of primitive Key implementations, but at least it works. Will probably look for something better.

Related

HashMap.get() doesn't return the proper value, thanks to "hashCode()"

I'm currently working on a TD game with a map editor. Now obviously, you can save and load these maps (or should be able to, at least).
Problem is: at some point I'm calling .get() on a HashMap. Unfortunately, the keys that should be the same (logic-wise) are not the same object (in terms of reference), and, according to my previous google research, overriding their .equals method isn't sufficient, since they still return different hashes with .hashCode() (I verified that, they DO return different hashes, while .equals does return true).
(On a side note, that's rather confusing, since the javadoc of HashMap.get(key) only states that they have to be equal)
More specifically, the HashMap contains instances of a class Path of mine as keys, and should return the corresponding list of enemies (= value).
short version of Path (without getters etc.):
public class Path
{
private List<Tile> tiles = new ArrayList<>();
#Override
public boolean equals(Object obj) {
//code comparing the two paths
}
#Override
public int hashCode() {
//what I still need to implement. ATM, it returns super.hashCode()
}
}
public class Tile
{
private int x;
private int y;
//constructor
//overrides equals
//getters & some convenience methods
}
Now if two Paths are equal, I'd like them to return the same hash code, so that the HashMap returns the correct list of enemies. (I'll make sure not two identical paths can be added).
Now my question:
Do you suggest
using some external library to generate a hash
that I write my own implementation of calculating a hash, or
something else
?
Note that I'd prefer to avoid changing the HashMap to some other type of map, if that would even help solve the problem.

You definitely do need to implement your hashCode consistent with equals. IDEs often do decent job generating hashCode and equals. Also consider Objects.equals(...) and Objects.hash(...).
One warning about using Path as keys in the HashMap. You will have to make the class immutable to make it work reliably. Or at least make sure that hashCode of the key does not change. Otherwise you may not able to get you data back even with the same or equal key.

The List has a useful method which conveniently is also named list.hashCode(). This will compute the hashCode of all the elements inside the list. So you also have to implement the hashCode for Tile which probably consist of some primitive fields or such.
e.g.
#Override
public int hashCode() {
return tiles != null ? tiles.hashCode() : 0;
}
See the docs here
int hashCode()
Returns the hash code value for this list. The hash code of a list is defined to be the result of the following calculation:
int hashCode = 1;
for (E e : list)
hashCode = 31*hashCode + (e==null ? 0 : e.hashCode());
This ensures that list1.equals(list2) implies that list1.hashCode()==list2.hashCode() for any two lists, list1 and list2, as required by the general contract of Object.hashCode().

Can I change the inner structure of objects in a HashTable while iterating over it?

Like, the title says. Can I change the inner structure of objects in a HashTable while iterating over its keys? I know I cant change the Map itself, or at least that it is risky to do so, but despite google searches I haven't found any clear or simple answer as to whether or not it is ok to change the attributes of the objects themselves in the hashmap. My gut feeling says no, since this would probably change the hash, but it would be good to know for certain. I am also interested in replacing the value for the keys while iterating over them. Is this possible?
Apologies if this has been answered a lot of times before.
To be short, will these two methods work as expected?
public class Manager {
private Hashtable<MyClassA, BufferedImage> ht1;
private Hashtable<MyClassB, JSlider> ht2;
private Image method1() {
for(MyClassB mcb: ht2.keySet()){
mcb.calculateStuff(ht2.get(mcb).getValue());
//CalculateStuff() doesnt change anything, but if it takes long, the JSliders might be
//changed by the user or a timer, resulting in a new hashCode(), and potentially problems.
}
}
private void method2(){
for(MyClassA mca: ht1.keySet()){
mca.changeInnerStructureOfA(); //Changes the fields of the object mca.
ht1.put(mca.calculateNewImage());
}
}

It is not allowed to mutate keys of a hash-based container in any situation, not only while iterating over the container. The reason for this is that any mutation that changes the value of hash function leaves your container in an invalid state, when the hashed key is sitting in the hash bucket that does not correspond to the key's hash value.
This is the reason behind a strong recommendation of using only immutable classes as keys in hash-based containers.
I am also interested in replacing the value for the keys while iterating over them. Is this possible?
No, this is not possible. In order to replace a key in a container with another key you need to remove the item first, and then re-insert it back with the new key. This, however, would trigger concurrent modification exception.
If you need to replace a significant number of keys, the best approach would be making a new hash container, and populate it with key-vale pairs as you iterate the original container.
If you need to replace only a small number of keys, make a list of objects describing the change (old key, new key, value), populate the list as you iterate then original container, and then walk the list of changes to make the alterations to the original container.

Do HashMap values need to be immutable?

I understand that keys in a HashMap need to be immutable, or at least ensure that their hash code (hashCode()) will not change or conflict with another object with different state.
However, do the values stored in HashMap need to be the same as above*? Why or why not?
* The idea is to be able to mutate the values (such as calling setters on it) without affecting the ability to retrieve them later from the HashMap using immutable keys. If values are mutated, can that break their association to the keys?
My question is mainly concerning Java, however, answers for other languages are welcome.

No. In general the characteristics of a hash map data structure do not depend upon the value. Having keys be immutable is important because the underlying data structures are built using the hash of the key at the time of insertion. The underlying data structures are designed to provide certain properties (relatively fast look up, fast removal, fast removal, etc...) all based upon this hash. If this hash were to change then the data structure with these nice properties based upon a hash which has changed will be invalidated. If you need to "modify" a key one general approach is to remove the old key and re-insert the new key.

The values of a HashMap do not need to be immutable. The map generally does not care what happens to the value (unless it is a ConcurrentHashMap), only where in memory that value is located.
In the case of ConcurrentHashMap, the mutability of the value is not affected, but it would be overly broad to say that the map does not "care" what happens to the value. Even though concurrency is allowed on updates to the map, the values that the map points to can be manipulated with no effect to the the immutable keys.

RE: If values are mutated, can that break their association to the keys?.
No.
A Map returns an object reference given a key. That key will alway point to the same object reference. Changing the object in some way (i.e. changing its instance variables) will not affect the ability to retrieve that object.

not really. having a mutable key will bring significant issues but what values you put it does not really matter. If the value is an mutable object and one modifies it, then of course the value is updated as well (but that's nothing to do with Hashmap).

No values do not need to be immutable, but can be very good practice. This of course depends on your use-case.
Here is a use-case where immutability was important: I recently ran into a bug because of this. An entry was put in a cache (backed by a HashMap). Later, this entry was retrieved and altered. Because the value was mutable (i.e. allowed changes), the next retrieve of the entry still had the edits made by the previous retriever. This was a problem because in my use case, the cache data was not supposed to change.
Consider this example:
Class Foo {
int a;
public Foo(int a) { this.a = a; }
public void setA(int x) { this.a = a; }
}
Map<String, Foo> data = getFooMap();
Foo foo = new Foo(17);
data.put("entry1", foo);
Foo entry1 = map.get("entry1);
System.out.println(entry1.a); // prints "17"
entry1.setA(18);
...
Foo entry1 = map.get("entry1);
System.out.println(entry1.a); // prints "18"

What happens to the lookup in a Hashmap or Hashset when the objects Hashcode changes

In a Hashmap the hash code of the key provided is used to place the value in the hashtable. In a Hashset the obects hashcode is used to place the value in the underlying hashtable. i.e the advantage of the hashmap is that you have the flexibility of deciding what you want as the key so you can do nice things like this.
Map<String,Player> players = new HashMap<String,Player>();
This can map a string such as the players name to a player itself.
My question is is what happens to to the lookup when the key's Hashcode changes.
This i expect isn't such a major concern for a Hashmap as I wouldn't expect nor want the key to change. In the previous example if the players name changes he is no longer that player. However I can look a player up using the key change other fields that aren't the name and future lookups will work.
However in a Hashset since the entire object's hashcode is used to place the item if someone slightly changes an object future lookups of that object will no longer resolve to the same position in the Hashtable since it relies on the entire objects Hashcode. Does this mean that once data is in a Hashset it shouldnt be changed. Or does it need to be rehashed? or is it done automatically etc? What is going on?

In your example, a String is immutable so its hashcode cannot change. But hypothetically, if the hashcode of an object did change while was a key in a hash table, then it would probably disappear as far as hashtable lookups were concerned. I went into more detail in this Answer to a related question: https://stackoverflow.com/a/13114376/139985 . (The original question is about a HashSet, but a HashSet is really a HashMap under the covers, so the answer covers this case too.)
It is safe to say that if the keys of either a HashMap or a TreeMap are mutated in a way that affects their respective hashcode() / equals(Object) or compare(...) or compareTo(...) contracts, then the data structure will "break".
Does this mean that once data is in a Hashset it shouldn't be changed.
Yes.
Or does it need to be rehashed? or is it done automatically etc?
It won't be automatically rehashed. The HashMap won't notice that the hashcode of a key has changed. Indeed, you won't even get recomputation of the hashcode when the HashMap resizes. The data structure remembers the original hashcode value to avoid having to recalculate all of the hashcodes when the hash table resizes.
If you know that the hashcode of a key is going to change you need to remove the entry from the table BEFORE you mutate the key, and add it back afterwards. (If you try to remove / put it after mutating the key, the chances are that the remove will fail to find the entry.)
What is going on?
What is going on is that you violated the contract. Don't do that!
The contract consists of two things:
The standard hashcode / equals contract as specified in the javadoc for Object.
An additional constraint that an object's hashcode must not change while it is a key in a hash table.
The latter constraint is not stated specifically in the HashMap javadoc, but the javadoc for Map says this:
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map.
A change that affects equality (typically) also affects the hashcode. At the implementation level, if a HashMap entry's key's hashcode changes, the entry will typically now be in the wrong hash bucket and will be invisible to HashMap methods that perform lookups.

In your example, the keys are String which are immutable. So the hashcode of the keys won't change. What happens when the hashcode of the keys changes is undefined and leads to "weird" behaviour. See the example below, which prints 1, false and 2. The object remains in the set, but the set looks like it is broken (contains returns false).
Extract from Set's javadoc:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
public static void main(String args[]) {
Set<MyObject> set = new HashSet<>();
MyObject o1 = new MyObject(1);
set.add(o1);
o1.i = 2;
System.out.println(set.size()); //1
System.out.println(set.contains(o1)); //false
for (MyObject o : set) {
System.out.println(o.i); //2
}
}
private static class MyObject {
private int i;
public MyObject(int i) {
this.i = i;
}
#Override
public int hashCode() {
return i;
}
#Override
public boolean equals(Object obj) {
if (obj == null) return false;
if (getClass() != obj.getClass()) return false;
final MyObject other = (MyObject) obj;
if (this.i != other.i) return false;
return true;
}
}

With Java's hashes, the original reference is simply not found. It's searched in the bucket corresponding the current hashcode, and not found.
To recover from this after the fact, the Hash keySet must be iterated over, and and any key which is not found by contains method must be removed through the iterator. Preferable is to remove the key from the map, then store the value with new key.

The HashSet is backed up by a HashMap.
From the javadocs.
This class implements the Set interface, backed by a hash table
(actually a HashMap instance).
So if you change the hashcode, I doubt whether you can access the object.
Internal Implementation Details
The add implementation of HashSet is
public boolean add(E e) {
return map.put(e, PRESENT)==null;
}
The key is the elem and value is just a dummy Object called PRESENT
and the contains implementation is
public boolean contains(Object o) {
return map.containsKey(o);
}

Why are immutable objects in hashmaps so effective?

So I read about HashMap. At one point it was noted:
"Immutability also allows caching the hashcode of different keys which makes the overall retrieval process very fast and suggest that String and various wrapper classes (e.g., Integer) provided by Java Collection API are very good HashMap keys."
I don't quite understand... why?

String#hashCode:
private int hash;
...
public int hashCode() {
int h = hash;
if (h == 0 && count > 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
Since the contents of a String never change, the makers of the class chose to cache the hash after it had been calculated once. This way, time is not wasted recalculating the same value.

Quoting the linked blog entry:
final object with proper equals () and hashcode () implementation would act as perfect Java HashMap keys and improve performance of Java hashMap by reducing collision.
I fail to see how both final and equals() have anything to do with hash collisions. This sentence raises my suspicion about the credibility of the article. It seems to be a collection of dogmatic Java "wisdoms".
Immutability also allows caching there hashcode of different keys which makes overall retrieval process very fast and suggest that String and various wrapper classes e.g Integer provided by Java Collection API are very good HashMap keys.
I see two possible interpretations of this sentence, both of which are wrong:
HashMap caches hash codes of immutable objects. This is not correct. The map doesn't have the possibility to find out if an object is "immutable".
Immutability is required for an object to cache its own hash code. Ideally, an object's hash value should always just rely on non-mutating state of the object, otherwise the object couldn't be sensibly used as a key. So in this case, too, the author fails to make a point: If we assume that our object is not changing its state, we also don't have to recompute the hash value every time, even if our object is mutable!
Example
So if we are really crazy and actually decide to use a List as a key for a HashMap and make the hash value dependent on the contents, rather than the identity of the list, we could just decide to invalidate the cached hash value on every modification, thus limiting the number of hash computations to the number of modifications to the list.

It's very simple. Since an immutable object doesn't change over time, it only needs to perform the calculation of the hash code once. Calculating it again will yield the same value. Therefore it is common to calculate the hash code in the constructor (or lazily) and store it in a field. The hashcode function then returns just the value of the field, which is indeed very fast.

Basically immutability is achieved in Java by making the class not extendable and all the operations in the object will ideally not change the state of the object. If you see the operations of String like replace(), it does not change the state of the current object with which you are manipulating rather it gives you a new String object with the replaced string. So ideally if you maintain such objects as keys the state doesn't change and hence the hash code also remains unchanged. So caching the hash code will be performance effective during retrievals.

Think of the hashmap as a big array of numbered boxes. The number is the hashcode, and the boxes are ordered by number.
Now if the object can't change, the hash function will always reproduce the same value. Therefore the object will always stay in it's box.
Now suppose a changeable object. It is changed after adding it to the hash, so now it is sitting in the wrong box, like a Mrs. Jones which happened to marry Mister Doe, and which is now named Doe too, but in many registers still named Jones.

Immutable classes are unmodifiable, that's why those are used as keys in a Map.
For an example -
StringBuilder key1=new StringBuilder("K1");
StringBuilder key2=new StringBuilder("K2");
Map<StringBuilder, String> map = new HashMap<>();
map.put(key1, "Hello");
map.put(key2, "World");
key1.append("00");
System.out.println(map); // This line prints - {K100=Hello, K2=World}
You see the key K1 (which is an object of mutable class StringBuilder) inserted in the map is lost due to an inadvertent change to it. This won't happen if you use immutable classes as keys for the Map family members.

Hash tables will only work if the hash code of an object can never change while it is stored in the table. This implies that the hash code cannot take into account any aspect of the object which could change while it's in the table. If the most interesting aspects of an object are mutable, that implies that either:
The hash code will have to ignore most of the interesting aspects of the object, thus causing many hash collisions, or...
The code which owns the hash table will have to ensure that the objects therein are not exposed to anything that might change them while they are stored in the hash table.
If Java hash tables allowed clients to supply an EqualityComparer (the way .NET dictionaries do), code which knows that certain aspects of the objects in a hash table won't unexpectedly change could use a hash code which took those aspects into account, but the only way to accomplish that in Java would be to wrap each item stored in the hashcode in a wrapper. Such wrapping may not be the most evil thing in the world, however, since the wrapper would be able to cache hash values in a way which an EqualityComparer could not, and could also cache further equality-related information [e.g. if the things being stored were nested collections, it might be worthwhile to compute multiple hash codes, and confirm that all hash codes match before doing any detailed inspection of the elements].

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.