HashSet internal storing same hash bucket

HashSet internal storing same hash bucket - java

How are the following Person objects being stored inside the same hash bucket ? As a linked list ? Also, according to java 8, if a certain treshhold is reached, the linked list is transformed into an tree ? Is this also correct ?
class TestHashSet
{
public static void main (String[] args) throws java.lang.Exception
{
Person p1 = new Person("Mike");
Person p2 = new Person("Mike");
Set persons = new HashSet();
persons.add(p1);
persons.add(p2);
Iterator iterator = persons.iterator();
while (iterator.hasNext()) {
System.out.println("Value: "+((Person)iterator.next()).getName() + " ");
}
}
}
class Person {
String name;
String getName(){
return name;
}
Person(String name){
this.name = name;
}
public int hashCode(){
return name.hashCode();
}
public boolean equals(Object o){
return false;
}
}

Yes. Colliding entries are first stored as a linked list and later on
after a certain threshold as a tree

For your 1st question, the 2 objects will be stored as 2 different entries in the hash set. The reason being the false returned from equals method.
The 2 objects are compared. First of all, hash code is checked. Since same hash code is found, equals method is checked, which returns false, and hence the object is stored again.
The important thing to note here is that since the hash codes are same, they get into the same bucket, but as 2 different entries (as a scenario of collision).
For the 2nd question, as of java 8, binary tree is used instead of linked list for the purpose after a certain threshold is reached. For reference, check https://www.nagarro.com/de/blog/post/24/performance-improvement-for-hashmap-in-java-8

Related

In Java: What happens if I change a key in an HashMap to be equal to another key? [duplicate]

This question already has answers here:
Changing an object which is used as a Map key
(5 answers)
Closed 5 years ago.
I know that I can not have 2 keys in a HashMap which are equal (by the equals()-method). And if I try to add a key-value-pair to the HashMap with the key already existing, the old value is just replaced by the new one.
But what if I change an already existing key to be equal to another existing key?
How will the map.get() method behave in this case (applied to one of these equal keys)?
Very simple example below.
public class Person{
private int age;
private String name;
public Person(int a, String n){
age = a;
name = n;
}
public void setAge(int a){ age = a; }
public int getAge(){return age; }
public String getName() {return name; }
#Override
public boolean equals(Object o){
if(!(o instanceof Person)){return false;}
Person p = (Person) o;
return ((p.getName().equals(this.getName())) && (p.getAge() == this.getAge()));
}
#Override
public int hashCode(){return age;}
}
public class MainClass{
public static void main(String[]args){
Person p1 = new Person("Bill", 20);
Person p2 = new Person("Bill", 21);
HashMap<Person, String> map = new HashMap<>();
map.put(p1, "some value");
map.put(p2, "another value");
p1.setAge(21);
String x = map.get(p1); // <-- What will this be??
System.out.println(x);
}
}

When you mutate a key which is already present in the HashMap you break the HashMap. You are not supposed to mutate keys present in the HashMap. If you must mutate such keys, you should remove them from the HashMap before the change, and put them again in the HashMap after the change.
map.get(p1) will search for the key p1 according to its new hashCode, which is equal to the hash code of p2. Therefore it will search in the bucket that contains p2, and return the corresponding value - "another value" (unless both keys happen to be mapped to the same bucket, in which case either value can be returned, depending on which key would be tested first for equality).

In short: p1 will not be reachable anymore.
In general the map is using the hash function to split the keys to buckets and then the equal function to locate the correct key-value. when you change the value of p1 and with that its hash value. If you will look for it the map will look for the value in a different bucket and will not see it and the p1 that is in the map will not be reachable.

What is hashmap collisioning ? and does it occurs in my code?

I have written a code which has Student class and student objects are used as keys
as follows,
public class ExampleMain01 {
private static class Student{
private int studentId;
private String studentName;
Student(int studentId,String studentName){
this.studentId = studentId;
this.studentName = studentName;
}
#Override
public int hashCode(){
return this.studentId * 31;
}
#Override
public boolean equals(Object obj){
boolean flag = false;
Student st = (Student) obj;
if(st.hashCode() == this.hashCode()){
flag = true;
}
return flag;
}
#Override
public String toString(){
StringBuffer strb = new StringBuffer();
strb.append("HASHCODE ").append(this.hashCode())
.append(", ID ").append(this.studentId)
.append(", NAME ").append(this.studentName);
return strb.toString();
}
public int getStudentId() {
return studentId;
}
public String getStudentName() {
return studentName;
}
} // end of class Student
private static void example02() throws Exception{
Set<Student> studentSet = new HashSet<Student>();
studentSet.add(new Student(12, "Arnold"));
studentSet.add(new Student(12, "Sam"));
studentSet.add(new Student(12, "Jupiter"));
studentSet.add(new Student(12, "Kaizam"));
studentSet.add(new Student(12, "Leny"));
for(Student s : studentSet){
System.out.println(s);
}
} // end of method example02
private static void example03() throws Exception{
Map<Student, Integer> map = new HashMap<Student,Integer>();
Student[] students = new Student [] {
new Student(12, "Arnold"),
new Student(12, "Jimmy"),
new Student(12, "Dan"),
new Student(12, "Kim"),
new Student(12, "Ubzil")
};
map.put(students[0], new Integer(23));
map.put(students[1], new Integer(123));
map.put(students[2], new Integer(13));
map.put(students[3], new Integer(25));
map.put(students[4], new Integer(2));
Set<Map.Entry<Student, Integer>> entrySet = map.entrySet();
for(Iterator<Map.Entry<Student, Integer>> itr = entrySet.iterator(); itr.hasNext(); ){
Map.Entry<Student, Integer> entry = itr.next();
StringBuffer strb = new StringBuffer();
strb.append("Key : [ ").append(entry.getKey()).append(" ], Value : [ ").append(entry.getValue()).append(" ] ");
System.out.println(strb.toString());
}
} // end of method example03
public static void main(String[] args) {
try{
example02();
example03();
}catch(Exception e){
e.printStackTrace();
}
}// end of main method
} // end of class ExampleMain01
In the above code in Student class the hashcode and equals are implemented as follows,
#Override
public int hashCode(){
return this.studentId * 31;
}
#Override
public boolean equals(Object obj){
boolean flag = false;
Student st = (Student) obj;
if(st.hashCode() == this.hashCode()){
flag = true;
}
return flag;
}
Now when I compile and run the code,
the code in method example02 gives an output as
HASHCODE 372, ID 12, NAME Arnold
i.e the Set holds only one object,
What I understood that as the key of all the objects has the same hashcode hence only single object lies in the bucket 372. Am I right ?
Also the method example03() give output as
Key : [ HASHCODE 372, ID 12, NAME Arnold ], Value : [ 2 ]
From the above method we can see that as the key returns the same hashcode,
the Hashmap only holds the single key value pair.
So my question is where does the collision happens ?
Can a key can point to multiple values ?
Where does the linkedlist concept comes while searching for value of respective key ?
Can anybody please explain me the above things with respect to the examples I have shared ?

What is hashmap collisioning ?
There is no such thing as "hashmap collision".
There is such a thing as "hashcode collision". That happens when two objects have the same hashcode, but are not equal.
Hashcode collision is not a problem ... unless it happens frequently. A properly designed hash table data structure (including HashMap or HashSet) will cope with collision, though if the probability of collision is too high, performance will tend to suffer.
Does [hashcode collision] occur in my code?
No.
The problems with your code are not due hashcode collision:
Your equals method is in effect saying that two Student objects are equal if-and-only-if they have the same hashcode. Since your hashcode is computed from only the ID, this means that any Student objects with the same ID are equal by definition.
Then you add lots of Student objects that have the same ID (12) and different names. Obviously, that means they are equal. And that means that the HashSet will only hold one of them ... at any given time.
So my question is where does the collision happens ?
The problem is that the Student objects are all equal.
Can a key can point to multiple values ?
This is a HashSet not a HashMap. There are no "keys". A Set is a set of unique values ... where unique means that the members are not equal.
Where does the linkedlist concept comes while searching for value of respective key ?
If you are talking about the LinkedList class, it doesn't come into it.
If you are talking about linked lists in general, the implementation of HashSet can use a form of linked list to represent the hash chains. But that is an implementation detail, and not something that makes any difference to your example.
(If you really want to know how HashSet works use Google to search for "java.util.HashSet source" and read the source code. Note that the implementations are different in different versions of Java.)
Can a key can point to multiple values ?
No. The Map API doesn't support that.
But of course you could use a map like this:
Map<Key, List<Value>> myMultimap = .....

best way to find extra object among two lists

I have two custom lists say CompanyList such that
public class CompanyList<E> extends Collection<E> implements List<E> {}
Here I have list of CompanyList such that
public class CompanyMakeVO extends BaseVO {
private static final long serialVersionUID = 1L;
private String name;
public CompanyMakeVO() {
super();
}
public String getName() {
return this.name;
}
public void setName(String name) {
this.name = name;
}
// overrides equals
public boolean equals(Object obj) {
if (obj == null || !(obj.getClass() == this.getClass())) {
return false;
}
CompanyMakeVO make = (CompanyMakeVO) obj;
// NAME
String thisName = this.getName();
String thatName = make.getName();
if (null == thisName || null == thatName)
return false;
return thisName.equals(thatName);
}
// hashcode
public int hashCode() {
return getName().hashCode();
}
}
I have two such lists say oldList and newList both have some objects of CompanyMakeVO, each object represents a company name via name attribute.
Lets say Old list has 3 objects with name as Audi, BMW and Aston Martin while new list has 5 objects with name as Audi, BMW, Aston Martin, Jaquar and Tesla. The Lists will not have any duplicates items i.e comapny name will not be repeated. I need to find the unique element present in either list and also with the list name and element name.
What's the best way to find it out?

For small data sets, lists with a few elements, it is convenient to use List.removeAll().
For large data sets, like lists with millions of items, you can use a HashMap to get those elements.
Since List.removeAll() will try to compare each item in the first list against all elements in the second list, which is O(NM) complexity. For using HashMap, it only needs O(N+M), faster than the first one.

You can use removeAll() method from ArrayList as given below:
List<CompanyMakeVO> companyMakeVOListOld = new ArrayList<>();
//add your items to the old list
List<CompanyMakeVO> companyMakeVOListNew = new ArrayList<>();
//add your items to new list
//now removeAll duplicate items from new list by passing the old list
companyMakeVOListNew.removeAll(companyMakeVOListOld);
ArrayList - removeAll method API:
public boolean removeAll(Collection c)
Removes from this list all of its elements that are contained in the
specified collection.
https://docs.oracle.com/javase/7/docs/api/java/util/ArrayList.html#removeAll(java.util.Collection)

Need help to understand behaviour of HashMap [duplicate]

This question already has answers here:
Are mutable hashmap keys a dangerous practice?
(10 answers)
Closed 7 years ago.
Let's say I have a person class and equality is based on id attribute. Below is the implementation of Person class -
class Person {
private int id;
private String firstName;
public Person(int id, String firstName) {
super();
this.id = id;
this.firstName = firstName;
}
public int getId() {
return id;
}
public void setId(int id) {
this.id = id;
}
public String getFirstName() {
return firstName;
}
public void setFirstName(String firstName) {
this.firstName = firstName;
}
public int hashCode() {
return this.id;
}
public boolean equals(Object obj) {
return ((Person) obj).getId() == this.id;
}
}
I am using Person class as as key of a HashMap. Now see below code -
import java.util.HashMap;
public class TestReport {
public static void main(String[] args) {
Person person1 = new Person(1, "Person 1");
Person person2 = new Person(2, "Person 2");
HashMap<Person, String> testMap = new HashMap<Person, String>();
testMap.put(person1, "Person 1");
testMap.put(person2, "Person 2");
person1.setId(2);
System.out.println(testMap.get(person1));
System.out.println(testMap.get(person2));
}
}
Notice, though we have added two different person object as key to the HashMap, later we have changed the id of person1 object to 2 to make both the person object equal.
Now, I am getting output as -
Person 2
Person 2
I can see there are two key-value pairs in the HashMap with data: "person1/Person 1" and "person2/Person 2", still I will always get "Person 2" as output and I can never access value "Person 1". Also notice, we have duplicate key in HashMap.
I can understand the behavior after looking at the source code, but doesn't it seem to be problem? Can we take some precaution to prevent it?

It all depends on how hashCode() value is used by HashMap.
While it is required that two equal objects of same hash code, reverse is not necessarily true. Two unequal objects can have same hash code (as int has only finite set of possible values).
Everytime you put an object in HashMap, it stores the object in a bucket identified by key's hashCode(). So, based on how hashCode() is implemented, you should have a fair distribution of entries in various buckets.
Now, when you try to retrieve a value, the HashMap will identify the bucket in which given key falls, and then will iterate through all keys in that bucket to pick the entry for given key - in this stage it will use equals() method to identify the entry.
In your case, person1 is sitting in bucket 1 and person2 is sitting in bucket 2.
However, when you changed the hashCode() value of person1 by updating its id, the HashMap is unaware of this change. Later, when you look up an entry using person1 as key, the HashMap thinks that it should be present in bucket 2 (as person1.hashCode() is 2 now), and after that when it iterates bucket 2 using equals method, it finds an entry of person2 and thinks that it is the object that you are interested in as equals in your case too is based on id attribute.
Above explanation is evident when one looks at implementation of HashMap#get method as shown below:
public V get(Object key) {
if (key == null)
return getForNullKey();
int hash = hash(key.hashCode());
for (Entry<K,V> e = table[indexFor(hash, table.length)];
e != null;
e = e.next) {
Object k;
if (e.hash == hash && ((k = e.key) == key || key.equals(k)))
return e.value;
}
return null;
}
PS Sometimes when you know an answer to question, you forget to lookup for duplicate questions, and jump right into answering the question before anyone can reply - that's what happened in this case. Should be careful next time :-)

This is happening because your equals() method uses the hashcode only. You should be comparing all your person fields like firstName.

how can Ikeep one and remove other duplicate arrayList objects and update another object property?

I need your help in JAVA (with some sample code if possible) regarding to the following scenario:
I have a list with a classes object and want to check if one object property has duplicates then keep one of them and add others amounts with the kept one's amount. For example:
I have this class:
class Salary {
String names;
Double amount;
}
and the list say salary_list contains the following elements in it(for example):
[jony,john 300.96]
[fuse,norvi,newby 1000.55]
[john,jony 22.6]
[richard,ravi,navin 55.6]
[fuse,norvi,newby 200.6]
... ... ...
So what is my expected output is the same input list with the following revised result:
[jony,john 323.56]
[fuse,norvi,newby 1201.15]
[richard,ravi,navin 55.6]
N.B: order in names is not important so not the order of the elements after the duplicate elimination.
I am not good at english as well as in Java. So forgive me if any mistakes there.
Thanks in advance.

Enhance your Salary class as follows:
class Salary {
String names;
Double amount;
private String sortedNames = null;
#Override
public boolean equals(Object o)
{
if (o == null || ! (o instanceof Salary)) return false;
Salary othr = (Salary) o;
String thisNames = this.getSortedNames();
String othrNames = othr.getSortedNames();
return thisNames.equals(othrNames);
}
#Override
public int hashCode()
{
return getSortedNames().hashCode();
}
public String getSortedNames()
{
if (this.sortedNames == null)
{
String[] nameArr = this.names.split(",");
Arrays.sort(nameArr);
StringBuilder buf = new StringBuilder();
for (String n : nameArr)
buf.append(",").append(n);
this.sortedNames = buf.substring(buf.length()==0?0:1);
}
return this.sortedNames;
}
}
This assumes that Salary is immutable (that is, after it's created the values of names and amount won't change. You could then use this with a hash map to add up all the amounts having the same names.
Map<String,Salary> map = new HashMap<String,Salary>();
for (Salary s : list)
{
Salary e = map.get(s.getSortedNames());
if (e == null)
map.put(s.getSortedNames(), s);
else
e.amount += s.amount;
}
At this point the map contains all unique Salary objects with the total amount for each.

You have few very helpful tools in Java. You can
split String into array defining separator like "jony,john".split(",") will give you array {"jony","john"}
sort data in arrays by Arrays.sort(arrayToSort)
compare if arrays are equal (contain same values with same order) by Arrays.equals(array1, array2) or by comparing its String representation using Arrays.toString(array)
use Maps (like HashMap) to hold pairs [key -> value]
With these you can solve your problem.
Tip: You can use Map like <String, Double> to count your data. As key (String) you can use sorted representation of names. If Map already contains some sorted representation of names then increase value stored under that key.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.