I made an own class called Region and I store instances of Region in a HashSet. I use a HashSet, that there are no Objects which are equal in the list. The String name of a Region should be unique in the HashSet, so I have overriden the equals method.
My Question:
What happens if I store two regions with different names into the HashSet and then I make the different names equal (by a setter for the name)?
This is no duplicate. The other question is about equal HashSets and not about equal objects in HashSets.
The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.
-- the Set Javadoc
Related
I want to maintain a list of objects such that each object in the list is unique.Also I want to retrieve it at one point. Objects are in thousands and I can't modify their source to add a unique id. Also hascodes are unreliable.
My approach was to utilize the key uniqueness of a map.
Say a maintain a map like :
HashMap<Object,int> uniqueObjectMap.
I will add object to map with as a key and set a random int as value. But how does java determine if the object is unique when used as a key ?
Say,
List listOne;
List listTwo;
Object o = new Object;
listOne.add(o);
listTwo.add(o);
uniqueObjectMap.put(listOne.get(0),randomInt()); // -- > line 1
uniqueObjectMap.put(listTw0.get(0),randomInt()); // --> line 2
Will line 2 give an unique key violation error since both are referring to the same object o ?
Edit
So if will unqiueObjectMap.containsKey(listTwo.get(0)) return true ? How are objects determined to be equal here ? Is a field by field comparison done ? Can I rely on this to make sure only one copy of ANY type of object is maintained in the map as key ?
Will line 2 give an unique key violation error since both are referring to the same object o ?
- No. If a key is found to be already present, then its value will be overwritten with the new one.
Next, HashMap has a separate hash() method which Applies a supplemental hash function to a given hashCode (of key objects), which defends against poor quality hash functions.
It does so by calling the Object's hashcode() function.
The default implementation is roughly equivalent to the object's unique identifier (much like a memory address); however, there are objects that are compare-by-value. If dealing with a compare-by-value object, hashcode() will be overridden to compute a number based on the values, such that two identical values yield the same hashcode() number.
As for the collection items that are hash based, the put(...) operation is fine with putting something over the original location. In short, if two objects yeild the same hashcode() and a positive equals(...) result, then operations will assume that they are for all practical purposes the same object. Thus, put may replace the old with the new, or do nothing, as the object is considered the same.
It may not store two copies in the same "place" as it makes no sense to store two copies at the same location. Thus, sets will only contain one copy, as will map keys; however, lists will possibly contain two copies, depending on how you added the second copy.
How are objects determined to be equal here ?
By using equals and Hashcode function of Object class.
Is a field by field comparison done ?
No, if you dont implement equals and hashcode, java will compare the references of your objects.
Can I rely on this to make sure only one copy of ANY type of object is maintained in the map as key ?
No.
Using a Set is a better approch than using Map because it removes duplicates by his own, but in this case it wont work either because Set determinates duplicates the same way like a Map does it with Keys.
If you will refer to same then it ll not throw an error because when HashMap get same key then it's related value will be overwrite.
If the same key is exist in HashMap then it will be overwritten.
if you want to check if the key or value is already exist or not then you can use:
containsKey() and containsValue().
ex :
hashMap.containsKey(0);
this will return true if the key named 0 is already exist otherwise false.
By getting hashcode value using hash(key.hashCode())
HashMap has an inner class Entry with attributes
final K key;
V value;
Entry<K ,V> next;
final int hash;
Hash value is used to calculate the index in the array for storing Entry object, there might be the scenario where 2 unequal object can have same equal hash value.
Entry objects are stored in linked list form, in case of collision, all entry object with same hash value are stored in same Linkedlist but equal method will test for true equality. In this way, HashMap ensure the uniqueness of keys.
I know that the objects with same hashcode need not be the same. My question is: If a hashset encounters an object whose hashcode matches the hashcode of an object already in the hashset, but if the objects are not equal, will the hashset add the new object to it?
HashSet internally uses a HashMap with the values being constants the the key being the set elements. Thus the behavior is the same: if the hashcode is equal but the objects are not, collsion handling takes place and the object is put into a linked list for the resolved bucket.
The hashcode doesn't need to be the same, it just has to map to the same bucket. HashSet is based on HashMap and so it's behaviour is dependant on HashMap.
Two keys/elements where equals() returns false are not the same.
HashMap in Java <= 7 using a linked list of keys/elements for the same bucket. (Not a LinkedList as such) In Java 8 it can use a tree of keys/elements.
Yes it will, because the objects are actually not the same.
Yes it will add the new object. It will not replace since they are not equal.
I used a structure like the following to get unique elements from an array of objects.
dataList.put(column, new LinkedList<Object>(new HashSet<Object>(Arrays.asList(entry.getValue()))));
The array from entry.getValue() is a 100 element array containing values from 1 to 99, with 1 being repeated twice.
The documentation says that Arrays.asList(arr[]) method returns a fixed-length list of the same length as the array.
I have observed that the set created also contains the duplicate values given by the original array.
Please explain this behaviour.
More details.
I have also tried using set.addAll(Arrays.asList(entry.getValue()); , where set is a HashSet and got the same results.
The array returned by entry.getValue() is an array of type java.lang.Short
Most likely, you didn't override equals() in the class of the objects in the array returned by entry.getValue(). And especially since you are using a HashSet, you should override hashCode() too, so that it "agrees" with equals(), as per the javadoc of equals():
Note that it is generally necessary to override the hashCode method whenever this method is overridden, so as to maintain the general contract for the hashCode method, which states that equal objects must have equal hash codes.
If you don't override equals(), each instance will not be equal() to any other instance despite its "value" being the same, because that's the default implementation of equals(), so the Set will see both "1" objects as "different".
After running the above code fragment with arrays of various data types I found that arrays of primitive types are stored as a single array object, whereas arrays of Java Classes like String, Integer, Float etc. are stored as a collection of their elements.
In my case I had an array of int[] passed into the HashSet as a List, which was taken as a single Object and no filtering out of duplicate elements was done. When the array contained Integer objects, the HashSet could filter out the duplicates.
From this link
Name objects are immutable. All other things being equal, immutable
types are the way to go, especially for objects that will be used as
elements in Sets or as keys in Maps. These collections will break if
you modify their elements or keys while they're in the collection.
How do we know that the class "Name" is immutable? (class name visible in the link above mentioned)
What do they actually mean with "the collections will break if you modify their elements?"
Thanks in advance.
Because with mutable classes, you can change the properties based which they are organized/ordered in the Collections, and the holder class would not know about it.
Think that you could do:
public class Name implements Comparable<Name> {
private String firstname = null
// getters and setters
public int compareTo(Name name) {
// Compare based in firstName
}
}
And then:
Name name1 = new Name("John");
Name name2 = new Name("Mike");
SortedSet<Name> set = new TreeSet<Name>();
set.add(name1);
set.add(name2);
name1.setFirstName("Ralph");
Now, is set ordered or is it not?
In a similar way, changes that affect the hashCode of the instance break HashMap and similars, because the first that does these classes when inserting/retrieving objects is to use a specific bucket based in that value.
What they mean is, lookups based on object will be failed.
For example:
mylist.get(myObject);
will fail because the object reference you have will be different (due to modifications) from the one you are using to do get(...) call.
HashSet and HashMap rely on the contract for equals() and hashCode described in the javadoc for java.lang.Object. That means that for two objects being equal accorrding to equals() the calculated hashCode() must also be equal.
If the hashCode() for a object in a Set or Map changes during the time the object is in the Set or Map the implementation will not find the object as it is saved in the bucket for the old hashCode().
Therefore changing the hashCode() while an object is in a Set or Map is a really bad idea.
From the docs on Map
Note: great care must be exercised if mutable objects are used as map keys. The behavior of a map is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is a key in the map. A special case of this prohibition is that it is not permissible for a map to contain itself as a key. While it is permissible for a map to contain itself as a value, extreme caution is advised: the equals and hashCode methods are no longer well defined on such a map.
From the docks on Set
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Lookups are done with .equals on the keys, if the keys are mutable the lookup will fail.
I'm a little befuddled by some code:
for (AbstractItem item : mSetOfItems) {
if (item.equals(pPrimaryItem))
{
System.out.println("Contains? " + mSetOfItems.contains(pPrimaryItem));
}
}
How could it be possible that item.equals(pPrimaryItem) resolves as true, and mSetOfItems.contains(pPrimaryItem) resolves as false? Because that's what I'm seeing in my code.
In other words, if I iterate through my set, I can find an element equal to my test element. But if I use contains, my test elements is not reported being in the set. I'm baffled because I thought contains used equals. What could I be overlooking?
You didn't give the type of mSetOfItems, but I'm guessing that AbstractItem overrides .equals() but not .hashcode(). This is bad.
If mSetOfItems uses hashcode for lookup, which it could based on its type, you'll get the behavior you described.
Your assumption is that .contains() is implemented with iteration and .equals(). There's no list interface which guarantees that.
What is the implementation of mSetOfItems?
If it's a tree, it could be that your comparison function returns inconsistent values.
If it's a hash, it could be that your equals() returns true for objects with different hash codes, or that the object's hashCode() has changed since it was inserted into the set.
If your set is a TreeSet or some other set where you're using a custom comparator, then you could see this if the comparator was broken, either by not returning a valid sorted order or by having objects that are actually equal compare unequal. When the set internally looks up an element and uses the comparator, it would make a wrong choice and not see the element.
If your set is a HashSet, your hash function could be broken and cause two objects that are equal to have different hash code. Internally as the HashSet uses the object's hash code to figure out where to look, it might end up looking in the wrong bucket.
Alternatively, if you store objects in a Set of any sort and then modify them, you might end up breaking some internal invariant of the Set. For instance, if you store something in a HashSet and then change its value, it will be in the wrong bucket, and if you have a TreeSet and change the value it may appear in the wrong spot in sorted order.
If you are concurrently modifying the set, it's possible that you might have added the element in another thread but not had any guarantees that the operation that made that change be visible in another thread. The second thread would then not see the element even if it were added.
Check the hashcode() method of your class
If mSetOfItems is a java.util.HashTable (or similar 'Hash' Collection, Set, etc) then you must implement hashCode() as well. boolean contains(Object elem) will first try to find the passed object by calculating its hash and retrieving it in the Collection. Once contains finds something, it will then use the equals method to verify that the two objects are the same objects according to your implementation.
If not properly overridden, hashCode() will return an unpredictable int that is usually the integer representation of the internal address of the object itself. This will always be different for two distinct objects no matter the the values of their instance variables. If not overridden, contains won't be able to find it any objects...
When implementing hashCode() remind that:
Whenever it is invoked on the same object more than once during an execution of a Java application, the hashCode method must consistently return the same integer, provided no information used in equals comparisons on the object is modified. This integer need not remain consistent from one execution of an application to another execution of the same application.
If two objects are equal according to the equals(Object) method, then calling the hashCode method on each of the two objects must produce the same integer result.
It is not required that if two objects are unequal according to the equals(java.lang.Object) method, then calling the hashCode method on each of the two objects must produce distinct integer results. However, the programmer should be aware that producing distinct integer results for unequal objects may improve the performance of hashtables.
Also, make sure that you properly overridden the equals function by respecting its signature:
public boolean equals(Object obj);