Do HashMap values need to be immutable? - java

I understand that keys in a HashMap need to be immutable, or at least ensure that their hash code (hashCode()) will not change or conflict with another object with different state.
However, do the values stored in HashMap need to be the same as above*? Why or why not?
* The idea is to be able to mutate the values (such as calling setters on it) without affecting the ability to retrieve them later from the HashMap using immutable keys. If values are mutated, can that break their association to the keys?
My question is mainly concerning Java, however, answers for other languages are welcome.

No. In general the characteristics of a hash map data structure do not depend upon the value. Having keys be immutable is important because the underlying data structures are built using the hash of the key at the time of insertion. The underlying data structures are designed to provide certain properties (relatively fast look up, fast removal, fast removal, etc...) all based upon this hash. If this hash were to change then the data structure with these nice properties based upon a hash which has changed will be invalidated. If you need to "modify" a key one general approach is to remove the old key and re-insert the new key.

The values of a HashMap do not need to be immutable. The map generally does not care what happens to the value (unless it is a ConcurrentHashMap), only where in memory that value is located.
In the case of ConcurrentHashMap, the mutability of the value is not affected, but it would be overly broad to say that the map does not "care" what happens to the value. Even though concurrency is allowed on updates to the map, the values that the map points to can be manipulated with no effect to the the immutable keys.

RE: If values are mutated, can that break their association to the keys?.
No.
A Map returns an object reference given a key. That key will alway point to the same object reference. Changing the object in some way (i.e. changing its instance variables) will not affect the ability to retrieve that object.

not really. having a mutable key will bring significant issues but what values you put it does not really matter. If the value is an mutable object and one modifies it, then of course the value is updated as well (but that's nothing to do with Hashmap).

No values do not need to be immutable, but can be very good practice. This of course depends on your use-case.
Here is a use-case where immutability was important: I recently ran into a bug because of this. An entry was put in a cache (backed by a HashMap). Later, this entry was retrieved and altered. Because the value was mutable (i.e. allowed changes), the next retrieve of the entry still had the edits made by the previous retriever. This was a problem because in my use case, the cache data was not supposed to change.
Consider this example:
Class Foo {
int a;
public Foo(int a) { this.a = a; }
public void setA(int x) { this.a = a; }
}
Map<String, Foo> data = getFooMap();
Foo foo = new Foo(17);
data.put("entry1", foo);
Foo entry1 = map.get("entry1);
System.out.println(entry1.a); // prints "17"
entry1.setA(18);
...
Foo entry1 = map.get("entry1);
System.out.println(entry1.a); // prints "18"

Related

Java HashMap remove item after its hash changed

I have a HashMap where keys are mutable complex objects - hash changes over their lifetime. I know exactly what objects are changed, but only after the fact - their removal using map.remove(object) will not work because the hash changed. Number of objects in the map is about in range [10, 10 000], the issue is rather in number of changes and accesses.
It would be demanding to do a "would you change" check on each object before changing it - double the work, not to mention the mess of a code necessary for it.
I do iterate entries in the map later on, so I figured I could simply mark objects for removal and get rid of them using iterator.remove(), but unfortunately HashMap$HashIterator#remove calls hash(key).
The one option that comes to my mind is to throw away the original map and rehash all objects that are not marked for removal into a new map, but that would generate a lot of extra time and memory garbage - would like to avoid it.
Another option would be writing my own HashMap that keeps track of where exactly is stored every element (say map formed by 2D object array = two int coordinates). This would be more efficient, but also a lot more to write and test.
Is there any easier way to do this that I have missed?
Edit:
I use wrappers over the complex object that supply different hash/equals pairs depending on subset of properties. Each object may be in multiple maps. Say I look for red object in map that uses wrappers with hash/equals over color, create red dummy object, and do map.get(dummy).
Implementations of hash/equals and specific properties they touch are not part of my code.
All maps are objects mapped onto themselves (like Set implementation, but I do need map access methods). I can store hashes in those wrappers, and then they will adhere to the contract from hash perspective, but equals will still fail me.
I do understand that by changing hash/equals output is undefined behavior, but it really should not matter in theory - I change object, and then I do not want to use the map until the changed object is gone from it. Hash map should not really need to call equals() or hash() for object it is already pointing at with iterator.
All maps are objects mapped onto themselves (like Set implementation, but I do need map access methods). I can store hashes in those wrappers, and then they will adhere to the contract from hash perspective, but equals will still fail me.
As others said, either try to find an immutable key (e.g. generated or a subset of some immutable properties) or have a look at other data structures, are some of the general recommendations witout seeing the code.
I didn't quite understand why you can "store hashes in those wrappers" but still have trouble with the equals method. (I guess the stored hashes would no be unique so they could be checked in the equals method?)
But if you have immutable hashes and if you have only one instance per "equal" object (not one instance stored in the map and another but equal instance used for lookup), you could have a look at the IdentityHashMap class.
Previous state:
User supplies equals/hash lambdas that work over complex object to place it each map in correct place (looking up objects of similar properties in constant time).
Complex object did change in inconvenient times causing issues with reinsert - object changes, pull it out, return it with new hash.
Current solution:
In theory could be solved with custom implementation of hash map (note NOT hash map interface, would not uphold its contract). This map would cache hashes for its contents for rehash purposes, and maintain coordinates in underlying structure so equals is not necessary for removal with values iterator. May implement it later to reduce memory footprint.
Used solution was forcing user to supply key that wraps all used properties and adds hash/equals that considers those properties. Now even though complex object changes, its key stays the same until prompted for update (not inside of the map at the time of the update).
public class Node {
public HashMap<Key, Node> map;
public Data<T> data;
public Key key;
public Node parent;
public void update() {
if (parent != null) parent.map.remove(key);
key.update(data);
if (parent != null) parent.map.put(key, this);
}
}
public abstract class Key {
public abstract void update(Data data);
public abstract int hashCode();
public abstract boolean equals(Object obj);
}
public class MyKey extends Key {
private Object value = null;
public final void update(Data data) {
value = data.value;
}
public final boolean equals(Object obj) {
IdentityKey that = (IdentityKey)obj;
return this.value == that.value;
}
public final int hashCode() {
return value == null ? 0 : value.hashCode();
}
}
This requires a lot of primitive Key implementations, but at least it works. Will probably look for something better.

Can I change the inner structure of objects in a HashTable while iterating over it?

Like, the title says. Can I change the inner structure of objects in a HashTable while iterating over its keys? I know I cant change the Map itself, or at least that it is risky to do so, but despite google searches I haven't found any clear or simple answer as to whether or not it is ok to change the attributes of the objects themselves in the hashmap. My gut feeling says no, since this would probably change the hash, but it would be good to know for certain. I am also interested in replacing the value for the keys while iterating over them. Is this possible?
Apologies if this has been answered a lot of times before.
To be short, will these two methods work as expected?
public class Manager {
private Hashtable<MyClassA, BufferedImage> ht1;
private Hashtable<MyClassB, JSlider> ht2;
private Image method1() {
for(MyClassB mcb: ht2.keySet()){
mcb.calculateStuff(ht2.get(mcb).getValue());
//CalculateStuff() doesnt change anything, but if it takes long, the JSliders might be
//changed by the user or a timer, resulting in a new hashCode(), and potentially problems.
}
}
private void method2(){
for(MyClassA mca: ht1.keySet()){
mca.changeInnerStructureOfA(); //Changes the fields of the object mca.
ht1.put(mca.calculateNewImage());
}
}
It is not allowed to mutate keys of a hash-based container in any situation, not only while iterating over the container. The reason for this is that any mutation that changes the value of hash function leaves your container in an invalid state, when the hashed key is sitting in the hash bucket that does not correspond to the key's hash value.
This is the reason behind a strong recommendation of using only immutable classes as keys in hash-based containers.
I am also interested in replacing the value for the keys while iterating over them. Is this possible?
No, this is not possible. In order to replace a key in a container with another key you need to remove the item first, and then re-insert it back with the new key. This, however, would trigger concurrent modification exception.
If you need to replace a significant number of keys, the best approach would be making a new hash container, and populate it with key-vale pairs as you iterate the original container.
If you need to replace only a small number of keys, make a list of objects describing the change (old key, new key, value), populate the list as you iterate then original container, and then walk the list of changes to make the alterations to the original container.

Why are immutable objects in hashmaps so effective?

So I read about HashMap. At one point it was noted:
"Immutability also allows caching the hashcode of different keys which makes the overall retrieval process very fast and suggest that String and various wrapper classes (e.g., Integer) provided by Java Collection API are very good HashMap keys."
I don't quite understand... why?
String#hashCode:
private int hash;
...
public int hashCode() {
int h = hash;
if (h == 0 && count > 0) {
int off = offset;
char val[] = value;
int len = count;
for (int i = 0; i < len; i++) {
h = 31*h + val[off++];
}
hash = h;
}
return h;
}
Since the contents of a String never change, the makers of the class chose to cache the hash after it had been calculated once. This way, time is not wasted recalculating the same value.
Quoting the linked blog entry:
final object with proper equals () and hashcode () implementation would act as perfect Java HashMap keys and improve performance of Java hashMap by reducing collision.
I fail to see how both final and equals() have anything to do with hash collisions. This sentence raises my suspicion about the credibility of the article. It seems to be a collection of dogmatic Java "wisdoms".
Immutability also allows caching there hashcode of different keys which makes overall retrieval process very fast and suggest that String and various wrapper classes e.g Integer provided by Java Collection API are very good HashMap keys.
I see two possible interpretations of this sentence, both of which are wrong:
HashMap caches hash codes of immutable objects. This is not correct. The map doesn't have the possibility to find out if an object is "immutable".
Immutability is required for an object to cache its own hash code. Ideally, an object's hash value should always just rely on non-mutating state of the object, otherwise the object couldn't be sensibly used as a key. So in this case, too, the author fails to make a point: If we assume that our object is not changing its state, we also don't have to recompute the hash value every time, even if our object is mutable!
Example
So if we are really crazy and actually decide to use a List as a key for a HashMap and make the hash value dependent on the contents, rather than the identity of the list, we could just decide to invalidate the cached hash value on every modification, thus limiting the number of hash computations to the number of modifications to the list.
It's very simple. Since an immutable object doesn't change over time, it only needs to perform the calculation of the hash code once. Calculating it again will yield the same value. Therefore it is common to calculate the hash code in the constructor (or lazily) and store it in a field. The hashcode function then returns just the value of the field, which is indeed very fast.
Basically immutability is achieved in Java by making the class not extendable and all the operations in the object will ideally not change the state of the object. If you see the operations of String like replace(), it does not change the state of the current object with which you are manipulating rather it gives you a new String object with the replaced string. So ideally if you maintain such objects as keys the state doesn't change and hence the hash code also remains unchanged. So caching the hash code will be performance effective during retrievals.
Think of the hashmap as a big array of numbered boxes. The number is the hashcode, and the boxes are ordered by number.
Now if the object can't change, the hash function will always reproduce the same value. Therefore the object will always stay in it's box.
Now suppose a changeable object. It is changed after adding it to the hash, so now it is sitting in the wrong box, like a Mrs. Jones which happened to marry Mister Doe, and which is now named Doe too, but in many registers still named Jones.
Immutable classes are unmodifiable, that's why those are used as keys in a Map.
For an example -
StringBuilder key1=new StringBuilder("K1");
StringBuilder key2=new StringBuilder("K2");
Map<StringBuilder, String> map = new HashMap<>();
map.put(key1, "Hello");
map.put(key2, "World");
key1.append("00");
System.out.println(map); // This line prints - {K100=Hello, K2=World}
You see the key K1 (which is an object of mutable class StringBuilder) inserted in the map is lost due to an inadvertent change to it. This won't happen if you use immutable classes as keys for the Map family members.
Hash tables will only work if the hash code of an object can never change while it is stored in the table. This implies that the hash code cannot take into account any aspect of the object which could change while it's in the table. If the most interesting aspects of an object are mutable, that implies that either:
The hash code will have to ignore most of the interesting aspects of the object, thus causing many hash collisions, or...
The code which owns the hash table will have to ensure that the objects therein are not exposed to anything that might change them while they are stored in the hash table.
If Java hash tables allowed clients to supply an EqualityComparer (the way .NET dictionaries do), code which knows that certain aspects of the objects in a hash table won't unexpectedly change could use a hash code which took those aspects into account, but the only way to accomplish that in Java would be to wrap each item stored in the hashcode in a wrapper. Such wrapping may not be the most evil thing in the world, however, since the wrapper would be able to cache hash values in a way which an EqualityComparer could not, and could also cache further equality-related information [e.g. if the things being stored were nested collections, it might be worthwhile to compute multiple hash codes, and confirm that all hash codes match before doing any detailed inspection of the elements].

In Java, what precautions should be taken when using a Set as a key in a Map?

I'm not sure what are prevailing opinions about using dynamic objects such as Sets as keys in Maps.
I know that typical Map implementations (for example, HashMap) use a hashcode to decide what bucket to put the entry in, and that if that hashcode should change somehow (perhaps because the contents of the Set should change at all, then that could mess up the HashMap by causing the bucket to be incorrectly computed (compared to how the Set was initially inserted into the HashMap).
However, if I ensure that the Set contents do not change at all, does that make this a viable option? Even so, is this approach generally considered error-prone because of the inherently volatile nature of Sets (even if precautions are taken to ensure that they are not modified)?
It looks like Java allows one to designate function arguments as final; this is perhaps one minor precaution that could be taken?
Do people even do stuff like this in commercial/open-source practice? (put List, Set, Map, or the like as keys in Maps?)
I guess I should describe sort of what I'm trying to accomplish with this, so that the motivation will become more clear and perhaps alternate implementations could be suggested.
What I am trying to accomplish is to have something of this sort:
class TaggedMap<T, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
...essentially, to be able to "tag" certain data (V) with certain keys (T) and write other auxiliary functions to access/modify the data and do other fancy stuff with it (ie. return a list of all entries satisfying some criteria of keys). The function of the _keys is to serve as a sort of index, to facilitate looking up the values without having to cycle through all of _map's entries.
In my case, I intend to specifically use T = String, V = Integer. Someone I talked to about this had suggested substituting a String for the Set, viz, something like:
class TaggedMap<V> {
Map<String, V> _map;
Map<T, Set<String>> _keys;
}
where the key in _map is of the sort "key1;key2;key3" with keys separated by delimiter. But I was wondering if I could accomplish a more generalised version of this rather than having to enforce a String with delimiters between the keys.
Another thing I was wondering was whether there was some way to make this as a Map extension. I was envisioning something like:
class TaggedMap<Set<T>, V> implements Map<Set<T>, V> {
Map<Set<T>, V> _map;
Map<T, Set<Set<T>>> _keys;
}
However, I was not able to get this to compile, probably due to my inferior understanding of generics. With this as a goal, can anyone fix the above declaration so that it works according to the spirit of what I had described or suggest some slight structural modifications? In particular, I am wondering about the "implements Map, V>" clause, whether it is possible to declare such a complex interface implementation.
You are correct that if you ensure that
The Set contents are not modified, and
The Sets themselves are not modified
That it is perfectly safe to use them as keys in a Map.
It's difficult to ensure that (1) is not violated accidentally. One option might be to specifically design the class being stored inside the Set so that all instances of that class are immutable. This would prevent anyone from accidentally changing one of the Set keys, so (1) would not be possible. For example, if you use a Set<String> as a key, you don't need to worry about the Strings inside the Set changing due to external modification.
You can make (2) possible quite easily by using the Collections.unmodifiableSet method, which returns a wrapped view of a Set that cannot be modified. This can be done to any Set, which means that it's probably a very good idea to use something like this for your keys.
Hope this helps! And if your user name means what I think it does, good luck learning every language! :-)
As you mention, sets can change, and even if you prevent the set from changing (i.e., the elements it contains), the elements themselves may change. Those factor into the hashcode.
Can you describe what you are trying to do in higher-level terms?
#templatetypedef's answer is basically correct. You can only safely use a Set as a key in some data structure if the set's state cannot change while it is a key. If the set's state changes, the invariants of the data structure are violated and operations on it will give incorrect results.
The wrappers created using Collections.unmodifiableSet can help, but there is a hidden gotcha. If the original set is still directly reachable, the application could modify it; e.g.
public void addToMap(Set key, Object value);
someMap.put(Collections.unmodifiableSet(key), value);
}
// but ...
Set someKey = ...
addToMap(someKey, "Hi mum");
...
someKey.add("something"); // Ooops ...
To guarantee that this can't happen, you need to make a deep copy of the set before you wrap it. That could be expensive.
Another problem with using a Set as a key is that it can be expensive. There are two general approaches to implementing key / value mappings; using hashcode method or using a compareTo method that implements an ordering. Both of these are expensive for sets.

Java sets: why there is no T get(Object o)? [duplicate]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I understand that only one instance of any object according to .equals() is allowed in a Set and that you shouldn't "need to" get an object from the Set if you already have an equivalent object, but I would still like to have a .get() method that returns the actual instance of the object in the Set (or null) given an equivalent object as a parameter.
Any ideas/theories as to why it was designed like this?
I usually have to hack around this by using a Map and making the key and the value same, or something like that.
EDIT: I don't think people understand my question so far. I want the exact object instance that is already in the set, not a possibly different object instance where .equals() returns true.
As to why I would want this behavior, typically .equals() does not take into account all the properties of the object. I want to provide some dummy lookup object and get back the actual object instance in the Set.
While the purity argument does make the method get(Object) suspect, the underlying intent is not moot.
There are various class and interface families that slightly redefine equals(Object). One need look no further than the collections interfaces. For example, an ArrayList and a LinkedList can be equal; their respective contents merely need to be the same and in the same order.
Consequently, there are very good reasons for finding the matching element in a set. Perhaps a clearer way of indicating intent is to have a method like
public interface Collection<E> extends ... {
...
public E findMatch(Object o) throws UnsupportedOperationException;
...
}
Note that this API has value broader that within Set.
As to the question itself, I don't have any theory as to why such an operation was omitted. I will say that the minimal spanning set argument does not hold, because many operations defined in the collections APIs are motivated by convenience and efficiency.
The problem is: Set is not for "getting" objects, is for adding and test for presence.
I understand what are you looking for, I had a similar situation and ended using a map of the same object in key and value.
EDIT: Just to clarify: http://en.wikipedia.org/wiki/Set_(abstract_data_type)
I had the same question in java forum years ago. They told me that the Set interface is defined. It cannot be changed because it will break the current implementations of Set interface. Then, they started to claim bullshit, like you see here: "Set does not need the get method" and started to drill me that Map must always be used to get elements from a set.
If you use the set only for mathematical operations, like intersection or union, then may be contains() is sufficient. However, Set is defined in collections to store data. I explained for need get() in Set using the relational data model.
In what follows, an SQL table is like a class. The columns define attributes (known as fields in Java) and records represent instances of the class. So that an object is a vector of fields. Some of the fields are primary keys. They define uniqueness of the object. This is what you do for contains() in Java:
class Element {
public int hashCode() {return sumOfKeyFields()}
public boolean equals(Object e) {keyField1.equals(e) && keyField2.equals(e) && ..}
I'm not aware of DB internals. But, you specify key fields only once, when define a table. You just annotate key fields with #primary. You do not specify the keys second time, when add a record to the table. You do not separate keys from data, as you do in mapping. SQL tables are sets. They are not maps. Yet, they provide get() in addition to maintaining uniqueness and contains() check.
In "Art of Computer Programming", introducing the search, D. Knuth says the same:
Most of this chapter is devoted to the study of a very simple search
problem: how to find the data that has been stored with a given
identification.
You see, data is store with identification. Not identification pointing to data but data with identification. He continues:
For example, in a numerical application we might want
to find f(x), given x and a table of the values of f; in a
nonnumerical application, we might want to find the English
translation of a given Russian word.
It looks like he starts to speak about mapping. However,
In general, we shall suppose that a set of N records has been stored,
and the problem is to locate the appropriate one. We generally require
the N keys to be distinct, so that each key uniquely identifies its
record. The collection of all records is called a table or file,
where the word "table" is usually used to indicate a small file, and
"file" is usually used to indicate a large table. A large file or a
group of files is frequently called a database.
Algorithms for searching are presented with a so-called argument, K,
and the problem is to find which record has K as its key. Although the
goal of searching is to find the information stored in the record
associated with K, the algorithms in this chapter generally ignore
everything but the keys themselves. In practice we can find the
associated data once we have located K; for example, if K appears in
location TABLE + i, the associated data (or a pointer to it) might be
in location TABLE + i + 1
That is, the search locates the key filed of the record and it should not "map" the key to the data. Both are located in the same record, as fileds of java object. That is, search algorithm examines the key fields of the record, as it does in the set, rather than some remote key, as it does in the map.
We are given N items to be sorted; we shall call them records, and
the entire collection of N records will be called a file. Each
record Rj has a key Kj, which governs the sorting process. Additional
data, besides the key, is usually also present; this extra "satellite
information" has no effect on sorting except that it must be carried
along as part of each record.
Neither, I see no need to duplicate the keys in an extra "key set" in his discussion of sorting.
... ["The Art of Computer Programming", Chapter 6, Introduction]
entity set is collection or set all entities of a particular entity type
[http://wiki.answers.com/Q/What_is_entity_and_entity_set_in_dbms]
The objects of single class share their class attributes. Similarly, do records in DB. They share column attributes.
A special case of a collection is a class extent, which is the
collection of all objects belonging to the class. Class extents allow
classes to be treated like relations
... ["Database System Concepts", 6th Edition]
Basically, class describes the attributes common to all its instances. A table in relational DB does the same. "The easiest mapping you will ever have is a property mapping of a single attribute to a single column." This is the case I'm talking about.
I'm so verbose on proving the analogy (isomorphism) between objects and DB records because there are stupid people who do not accept it (to prove that their Set must not have the get method)
You see in replays how people, who do not understand this, say that Set with get would be redundant? It is because their abused map, which they impose to use in place of set, introduces the redundancy. Their call to put(obj.getKey(), obj) stores two keys: the original key as part of the object and a copy of it in the key set of the map. The duplication is the redundancy. It also involves more bloat in the code and wastes memory consumed at Runtime. I do not know about DB internals, but principles of good design and database normalization say that such duplication is bad idea - there must be only one source of truth. Redundancy means that inconsistency may happen: the key maps to an object that has a different key. Inconsistency is a manifestation of redundancy. Edgar F. Codd proposed DB normalization just to get rid of redundancies and their inferred inconsistencies. The teachers are explicit on the normalization: Normalization will never generate two tables with a one-to-one relationship between them. There is no theoretical reason to separate a single entity like this with some fields in a single record of one table and others in a single record of another table
So, we have 4 arguments, why using a map for implementing get in set is bad:
the map is unnecessary when we have a set of unique objects
map introduces redundancy in Runtime storage
map introduces code bloat in the DB (in the Collections)
using map contradicts the data storage normalization
Even if you are not aware of the record set idea and data normalization, playing with collections, you may discover this data structure and algorithm yourself, as we, org.eclipse.KeyedHashSet and C++ STL designers did.
I was banned from Sun forum for pointing out these ideas. The bigotry is the only argument against the reason and this world is dominated by bigots. They do not want to see concepts and how things can be different/improved. They see only actual world and cannot imagine that design of Java Collections may have deficiencies and could be improved. It is dangerous to remind rationale things to such people. They teach you their blindness and punish if you do not obey.
Added Dec 2013: SICP also says that DB is a set with keyed records rather than a map:
A typical data-management system spends a large amount of time
accessing or modifying the data in the records and therefore requires
an efficient method for accessing records. This is done by identifying
a part of each record to serve as an identifying key. Now we represent
the data base as a set of records.
Well, if you've already "got" the thing from the set, you don't need to get() it, do you? ;-)
Your approach of using a Map is The Right Thing, I think. It sounds like you're trying to "canonicalize" objects via their equals() method, which I've always accomplished using a Map as you suggest.
I'm not sure if you're looking for an explanation of why Sets behave this way, or for a simple solution to the problem it poses. Other answers dealt with the former, so here's a suggestion for the latter.
You can iterate over the Set's elements and test each one of them for equality using the equals() method. It's easy to implement and hardly error-prone. Obviously if you're not sure if the element is in the set or not, check with the contains() method beforehand.
This isn't efficient compared to, for example, HashSet's contains() method, which does "find" the stored element, but won't return it. If your sets may contain many elements it might even be a reason to use a "heavier" workaround like the map implementation you mentioned. However, if it's that important for you (and I do see the benefit of having this ability), it's probably worth it.
So I understand that you may have two equal objects but they are not the same instance.
Such as
Integer a = new Integer(3);
Integer b = new Integer(3);
In which case a.equals(b) because they refer to the same intrinsic value but a != b because they are two different objects.
There are other implementations of Set, such as IdentitySet, which do a different comparison between items.
However, I think that you are trying to apply a different philosophy to Java. If your objects are equal (a.equals(b)) although a and b have a different state or meaning, there is something wrong here. You may want to split that class into two or more semantic classes which implement a common interface - or maybe reconsider .equals and .hashCode.
If you have Joshua Bloch's Effective Java, have a look at the chapters called "Obey the general contract when overriding equals" and "Minimize mutability".
Just use the Map solution... a TreeSet and a HashSet also do it since they are backed up by a TreeMap and a HashMap, so there is no penalty in doing so (actualy it should be a minimal gain).
You may also extend your favorite Set to add the get() method.
[]]
I think your only solution, given some Set implementation, is to iterate over its elements to find one that is equals() -- then you have the actual object in the Set that matched.
K target = ...;
Set<K> set = ...;
for (K element : set) {
if (target.equals(element)) {
return element;
}
}
If you think about it as a mathematical set, you can derive a way to find the object.
Intersect the set with a collection of object containing only the object you want to find. If the intersection is not empty, the only item left in the set is the one you were looking for.
public <T> T findInSet(T findMe, Set<T> inHere){
inHere.retainAll(Arrays.asList(findMe));
if(!inHere.isEmpty){
return inHere.iterator().next();
}
return null;
}
Its not the most efficient use of memory, but its functionally and mathematically correct.
"I want the exact object instance that is already in the set, not a possibly different object instance where .equals() returns true."
This doesn't make sense. Say you do:
Set<Foo> s = new Set<Foo>();
s.Add(new Foo(...));
...
Foo newFoo = ...;
You now do:
s.contains(newFoo)
If you want that to only be true if an object in the set is == newFoo, implement Foo's equals and hashCode with object identity. Or, if you're trying to map multiple equal objects to a canonical original, then a Map may be the right choice.
I think the expectation is that equals truely represent some equality, not simply that the two objects have the same primary key, for example. And if equals represented two really equal objects, then a get would be redundant. The use case you want suggests a Map, and perhaps a different value for the key, something that represents a primary key, rather than the whole object, and then properly implement equals and hashcode accordingly.
Functional Java has an implementation of a persistent Set (backed by a red/black tree) that incidentally includes a split method that seems to do kind of what you want. It returns a triplet of:
The set of all elements that appear before the found object.
An object of type Option that is either empty or contains the found object if it exists in the set.
The set of all elements that appear after the found object.
You would do something like this:
MyElementType found = hayStack.split(needle)._2().orSome(hay);
Object fromSet = set.tailSet(obj).first();
if (! obj.equals(fromSet)) fromSet = null;
does what you are looking for. I don't know why java hides it.
Say, I have a User POJO with ID and name.
ID keeps the contract between equals and hashcode.
name is not part of object equality.
I want to update the name of the user based on the input from somewhere say, UI.
As java set doesn't provide get method, I need to iterate over the set in my code and update the name when I find the equal object (i.e. when ID matches).
If you had get method, this code could have been shortened.
Java now comes with all kind of stupid things like javadb and enhanced for loop, I don't understand why in this particular case they are being purist.
I had the same problem. I fixed it by converting my set to a Map, and then getting them from the map. I used this method:
public Map<MyObject, MyObject> convertSetToMap(Set<MyObject> set)
{
Map<MyObject, MyObject> myObjectMap = new HashMap<MyObject, MyObject>();
for(MyObject myObject: set){
myObjectMap.put(myObject, myObject);
}
return myObjectMap
}
Now you can get items from your set by calling this method like this:
convertSetToMap(myset).get(myobject);
You can override the equals in your class to let it check on only a certain properties like Id or name.
if you have made a request for this in Java bug parade list it here and we can vote it up. I think at least the convenience class java.util.Collections that just takes a set and an object
and is implemented something like
searchSet(Set ss, Object searchFor){
Iterator it = ss.iterator();
while(it.hasNext()){
Object s = it.next();
if(s != null && s.equals(searchFor)){
return s;
}
}
This is obviously a shortcoming of the Set API.
Simply, I want to lookup an object in my Set and update its property.
And I HAVE TO loop through my (Hash)Set to get to my object... Sigh...
I agree that I'd like to see Set implementations provide a get() method.
As one option, in the case where your Objects implement (or can implement) java.lang.Comparable, you can use a TreeSet. Then the get() type function can be obtained by calling ceiling() or floor(), followed by a check for the result being non-null and equal to the comparison Object, such as:
TreeSet myTreeSet<MyObject> = new TreeSet();
:
:
// Equivalent of a get() and a null-check, except for the incorrect value sitting in
// returnedMyObject in the not-equal case.
MyObject returnedMyObject = myTreeSet.ceiling(comparisonMyObject);
if ((null != returnedMyObject) && returnedMyObject.equals(comparisonMyObject)) {
:
:
}
The reason why there is no get is simple:
If you need to get the object X from the set is because you need something from X and you dont have the object.
If you do not have the object then you need some means (key) to locate it. ..its name, a number what ever. Thats what maps are for right.
map.get( "key" ) -> X!
Sets do not have keys, you need yo traverse them to get the objects.
So, why not add a handy get( X ) -> X
That makes no sense right, because you have X already, purist will say.
But now look at it as non purist, and see if you really want this:
Say I make object Y, wich matches the equals of X, so that set.get(Y)->X. Volia, then I can access the data of X that I didn have. Say for example X has a method called get flag() and I want the result of that.
Now look at this code.
Y
X = map.get( Y );
So Y.equals( x ) true!
but..
Y.flag() == X.flag() = false. ( Were not they equals ?)
So, you see, if set allowed you to get the objects like that It surely is to break the basic semantic of the equals. Later you are going to live with little clones of X all claming that they are the same when they are not.
You need a map, to store stuff and use a key to retrieve it.
I understand that only one instance of any object according to .equals() is allowed in a Set and that you shouldn't "need to" get an object from the Set if you already have an equivalent object, but I would still like to have a .get() method that returns the actual instance of the object in the Set (or null) given an equivalent object as a parameter.
The simple interface/API gives more freedom during implementation. For example if Set interface would be reduced just to single contains() method we get a set definition typical for functional programming - it is just a predicate, no objects are actually stored. It is also true for java.util.EnumSet - it contains only a bitmap for each possible value.
It's just an opinion. I believe we need to understand that we have several java class without fields/properties, i.e. only methods. In that case equals cannot be measured by comparing function, one such example is requestHandlers. See the below example of a JAX-RS application. In this context SET makes more sense then any data structure.
#ApplicationPath("/")
public class GlobalEventCollectorApplication extends Application {
#Override
public Set<Class<?>> getClasses() {
Set<Class<?>> classes = new HashSet<Class<?>>();
classes.add(EventReceiverService.class);
classes.add(VirtualNetworkEventSerializer.class);
return classes;
}
}
To answer your question, if you have an shallow-employee object ( i.e. only EMPID, which is used in equals method to determine uniqueness ) , and if you want to get a deep-object by doing a lookup in set, SET is not the data-structure , as its purpose is different.
List is ordered data structure. So it follows the insertion order. Hence the data you put will be available at exact position the time you inserted.
List<Integer> list = new ArrayList<>();
list.add(1);
list.add(2);
list.add(3);
list.get(0); // will return value 1
Remember this as simple array.
Set is un ordered data structure. So it follows no order. The data you insert at certain position will be available any position.
Set<Integer> set = new HashSet<>();
set.add(1);
set.add(2);
set.add(3);
//assume it has get method
set.get(0); // what are you expecting this to return. 1?..
But it will return something else. Hence it does not make any sense to create get method in Set.
**Note****For explanation I used int type, this same is applicable for Object type also.
I think you've answered your own question: it is redundant.
Set provides Set#contains (Object o) which provides the equivalent identity test of your desired Set#get(Object o) and returns a boolean, as would be expected.

Categories

Resources