How Set checks for duplicates? Java HashSet

How Set checks for duplicates? Java HashSet - java

For the below code it outputs " 1 ". and second code outputs " 2 " I don't understand why this is happening. Is it because I am adding the same object? How should I achieve the desired output 2.
import java.util.*;
public class maptest {
public static void main(String[] args) {
Set<Integer[]> set = new HashSet<Integer[]>();
Integer[] t = new Integer[2];
t[0] = t[1] = 1;
set.add(t);
Integer[] t1 = new Integer[2];
t[0] = t[1] = 0;
set.add(t);
System.out.println(set.size());
}
}
Second Code:
import java.util.*;
public class maptest {
public static void main(String[] args) {
Set<Integer[]> set = new HashSet<Integer[]>();
Integer[] t = new Integer[2];
t[0] = t[1] = 1;
set.add(t);
Integer[] t1 = new Integer[2];
t1[0] = t1[1] = 1;
set.add(t1);
System.out.println(set.size());
}
}

The Set implementation probably calls t.hashCode() and since arrays don't override the Object.hashCode method, the same object will have the same hashcode. Changing the array's contents thus does not affect its hash code. To get an array's hash code correctly, you should call Arrays.hashCode.
You shouldn't really put mutable things inside sets anyways, so I would suggest you put immutable lists into sets instead. If you want to stick with arrays, just create a new array, like you did with t1, and put it into the set.
EDIT:
For code 2, t and t1 are two different arrays so their hash code are different. Again, since the hashCode method is not overridden in arrays. The array's contents don't effect the hash code, whether or not they are the same.

A Set contains only distinct element (it is its nature). The basic implementation, HashSet, use hashCode() to first find a bucket containing values then equals(Object) to look for a distinct value.
Arrays are simple: their hashCode() use the default, inherited from Object, and therefore depending on reference. The equals(Object) is also the same than Object: it check only the identify, that is: references must be equals.
Defined as Java:
public boolean equals(Object other) {
return other == this;
}
If you want to put distinct arrays, you'll have to either try your luck with TreeSet and a proper implementation of Comparator, either wrap you array or use a List or another Set:
Set<List<Integer[]>> set = new HashSet<>();
Integer[] t = new Integer[]{1, 1};
set.add(Arrays.asList(t));
Integer[] t1 = new Integer[]{1, 1};
set.add(Arrays.asList(t1));
System.out.println(set.size());
As for mutability of the object used in a Set or a Map key:
fields used by the boolean equals(Object) should not be muted because the muted object could be then equals to another. The Set would no longer contains distinct values.
fields used by the int hashCode() should not be muted for hash based collection (HashSet, HashMap) because as said above their operate by putting items in a bucket. If the hashCode() change, it is likely the place of the object in the bucket will also change: the Set would then contains twice the same reference.
fields used by the int compareTo(T) or Comparator::compare(T,T) should not be muted for the same reason than equals: the SortedSet would not know there was a change.
If the need arise, you would have to first remove item from the set, then mutate it, the re-add it.

You're adding the Object to a Set which
contains no duplicate elements.
You are only ever adding one Object to the Set. You only change the value of it's contents. To see what I mean try adding System.out.println(set.add(t));.
As the add() method:
Returns true if this set did not already contain the specified element
Also your t1 is completely irrelevant in your first code snippet as you never use it.
In your second code snippet it outputs two because you are adding two different Integer[] Objects to the Set
Try printing out the hashcode of the Objects to see how this works:
Integer[] t = new Integer[2];
t[0] = t[1] = 1;
//Before we change the values
System.out.println(t.hashCode());
Integer[] t1 = new Integer[2];
t1[0] = t1[1] = 1;
//After we change the values of t
System.out.println(t.hashCode());
//Hashcode of the second object
System.out.println(t1.hashCode());
Output:
//Hashcode for t is the same before and after modifying data
366712642
366712642
//Hashcode for t1 is different from t; different object
1829164700

How java.util.Set implementations check for duplicate objects depends on the implementation, but per the documentation of Set, the appropriate meaning of "duplicate" is that o1.equals(o2).
Since HashSet in particular is based on a hash table, it will go about looking for a duplicate by computing the hashCode() of the object presented to it, and then going through all the objects, if any, in the corresponding hash bucket.
Arrays do not override hashCode() or equals(), so they implement instance identity, not value identity. Thus, regardless of the values of its elements, a given array always has the same hash code, and always equals() itself and only itself. You first code adds the same array object to a set two times. Regardless of the values of its elements, it is still the same set. The second code adds two different array objects to a set. Regardless of the values of their elements, they are different objects.
Note, too, that if you have mutable objects that implement value identity, such that their equality and hash codes depends on the values of their members, then modifying such an object while it is a member of a Set very likely breaks the Set. This is documented on a per-implementation basis.

Related

Frequency of a value in a list of entities

I have an entity named Elementfisa, which contains as values (id,Post,Sarcina). Now, Post(Int Id,String Nume,String Tip) and Sarcina(Int Id,String Desc) are also entities. I have a List of all the elements I added as Elementfisa, and I want to get in a separate list the frequency of every Sarcina that every Elementfisa contains. This is my code right now:
int nr=0;
List<Integer> frecv=new ArrayList<Integer>();
List<Sarcina> sarcini = new ArrayList<>();
List<Elementfisa> efuri=findEFAll();
for (Elementfisa i : efuri)
{
nr=0;
for (Sarcina s : sarcini)
if (s.equals(i.getSarcina()))
nr=1;
if (nr==0)
{
int freq = Collections.frequency(efuri, i.getSarcina());
sarcini.add(i.getSarcina());
frecv.add(freq);
}
}
(findEFAll() returns every element contained in a Hashmap from a repository)
But for some reason, while the sarcini list contains all the Sarcina from every Elementfisa, the frequency list will show 0 on every position. What should I change so every position should show the correct number of occurrences?

You're using Collections.frequency() on efuri, a List<Elementfisa>. But you're passing i.getSarcina() to it, a Sarcina object. A List of Elementfisa cannot possibly contain a Sarcina object, so you get zero. You may have passed the wrong list to the method.
Edit:
To look at all Sarcinas in efuri, you can do this using Java 8 streams:
efuri.stream().map(element -> element.getSarcina())
.collect(Collectors.toList()).contains(i.getSarcina())
Breakdown:
efuri.stream() //Turns this into a stream of Elementfisa
.map(element -> element.getSarcina()) //Turns this into a stream of Sarcina
.collect(Collectors.toList()) //Turn this into a list
.contains(i.getSarcina()) //Check if the list contains the Sarcina

Are you sure you do not need to override equals() of Elementisa? (and hashcode() too). The default Java equals() does not seem to get what you want because it would be checking the identity (not the value) of two Elementisa objects, while in your logic, two such objects with the same values may be considered as equivalent.
For more information on equals(), see
What issues should be considered when overriding equals and hashCode in Java?

Compare and match arrays with different sizes

I have a couple of arrays with different sizes; say, array A and array B.
Array A
[chery, chery, uindy, chery, chery]
Array B
[chery, uindy]
Need to check whether the values present in Array A is available in Array B or not. In the above example, all the values in Array A is available in Array B. Please help this out with the Java code. Thanks!

You can convert your arrays to a List and then use the containsAll method to see if a particular list contains all elements described in another list.
You would get better performance out of it if they were Sets instead.
Example:
List<String> firstList = Arrays.asList("chery", "chery", "unid", ...);
List<String> secondList = Arrays.asList("chery", "unid", ...);
System.out.println(secondList.containsAll(firstList));
If the performance of this method in particular is getting a bit dodgy, then consider converting the lists into Sets instead:
Set<String> firstSet = new HashSet<>(Arrays.asList("chery", "chery", "unid", ...));

In the example I am using integers but can be used for other types also with slight modifications.
First put a loop on array A elements.
for(int i =0; i<A.length(); i++)
{
//this loop will transverse with all elements in array A.
}
Now inside this for loop make another for loop which transverse through elements of loop B.
for(int i =0; i<A.length(); i++)
{
for(int j=0; j<B.length();j++)
{
if(A[i] == B[j])
{ System.out.println("this element is in array A and B"); }
}
}
Now if you want to check if all elements of A are in B you can make a boolean. this boolean is true as long each element in A is found at least once in B. as soon as you find one element which is not present on both arrays you can exit.

Base on your requirement, you are going to find out if B is a superset of A (I mean the distinct values).
This can be easily done by one line like this:
String[] aArr = {.....};
String[] bArr = {.....};
return new HashSet<String>(Arrays.asList(bArr)).containsAll(Arrays.asList(aArr));
In brief, make B a Set, and check if B set contains all values of A
so, if A = {Apple, Apple, Banana, Cherry} and B = {Apple, Banana, Cherry, Pineapple}, it will return true (that's the behavior base on your description)

For arrays of Strings :
for (String str : array1)
{
System.out.println(ArrayUtils.contains(array2, str);
}

An array is not a good data structure for doing this. A Set is better. So convert your two arrays to Set objects, then simply use Set.equals(). Either do the conversion by creating new objects just before the comparison, or use a Set everywhere.

Set<String> setA = new HashSet<>(Arrays.asList(new String[]{"chery", "chery", "uindy", "chery", "chery"}));
Set<String> setB = new HashSet<>(Arrays.asList(new String[]{"chery", "uindy"}));
System.out.println("Sets are equal: " +setA.equals(setB));
The equals method of AbstractSet says
Compares the specified object with this set for equality. Returns true
if the given object is also a set, the two sets have the same size,
and every member of the given set is contained in this set. This
ensures that the equals method works properly across different
implementations of the Set interface. This implementation first checks
if the specified object is this set; if so it returns true. Then, it
checks if the specified object is a set whose size is identical to the
size of this set; if not, it returns false. If so, it returns
containsAll((Collection) o).

HashMap with ArrayList key can not find it when Arraylist grows

Well my problem is that in some part of my code I use an arraylist as a key in a hashmap for example
ArrayList<Integer> array = new ArrayList<Integer>();
And then I put my array like a key in a hash map (I need it in this way I'm sure of that)
HashMap<ArrayList<Integer>, String> map = new HashMap<ArrayList<Integer>, String>();
map.put(array, "value1");
Here comes the problem: When I add some value to my array and then I try to recover the data using the same array then the hash map cant find it.
array.add(23);
String value = map.get(array);
At this time value is null instead of string "value1"
I was testing and I discovered that the hashCode changes when array list grows up and this is the central point of my problem, but I want to know how can I fix this.

Use an IdentityHashMap. Then that same array instance will always map to the same value, no matter how its contents (and therefore hash code) are changed.

You can't use a mutable object (that is, one whose hashCode changes) as the key of a HashMap. See if you can find something else to use as the key instead. It's somewhat unusual to map a collection to a string; the other way around is much more common.

Its a weird use case but if you must do it then you can sub class the array and override the hashCode method.

Its a bit of an add thing to try and do in my opinion.
I assume what you are trying to model is a variable length key made up of n integers, and assume that the hash of the ArrayList will be consistent, but I'm not sure that is the case.
I would suggest that you either subclass ArrayList and override the hash() & equals() methods, or wrap the HashMap in a key class.

I'm almost certain you would not want to do that. It's more likely you would want a Map<String, List<Integer>>. However, if you absolutely must do this, use a holder class:
public class ListHolder {
private List<Integer> list = new ArrayList<Integer>();
public List<Integer> getList() {return list;}
}
Map<ListHolder, String> map = new HashMap<ListHolder, String>;

The basic reason: When we use HashMap.put(k, v), it will digit k.hashCode() so that it can know where to put it.
And it also find the value by this number(k.hashCode());
You can see the ArrayList.hashCode() function and it is in the abstract class of AbstractList. Obviously, after we add some object, it will change the haseCode value. So we can not find the value use HashMap.get(K) and there is no element which hashCode is K.
public int hashCode() {
int hashCode = 1;
for (E e : this)
hashCode = 31*hashCode + (e==null ? 0 : e.hashCode());
return hashCode;
}

Java HashSet vs HashMap

I understand that HashSet is based on HashMap implementation but is used when you need unique set of elements. So why in the next code when putting same objects into the map and set we have size of both collections equals to 1? Shouldn't map size be 2? Because if size of both collection is equal I don't see any difference of using this two collections.
Set testSet = new HashSet<SimpleObject>();
Map testMap = new HashMap<Integer, SimpleObject>();
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println(testSet.size());
System.out.println(testMap.size());
The output is 1 and 1.
SimpleObject code
public class SimpleObject {
private String dataField1;
private int dataField2;
public SimpleObject(){}
public SimpleObject(String data1, int data2){
this.dataField1 = data1;
this.dataField2 = data2;
}
public String getDataField1() {
return dataField1;
}
public int getDataField2() {
return dataField2;
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((dataField1 == null) ? 0 : dataField1.hashCode());
result = prime * result + dataField2;
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
SimpleObject other = (SimpleObject) obj;
if (dataField1 == null) {
if (other.dataField1 != null)
return false;
} else if (!dataField1.equals(other.dataField1))
return false;
if (dataField2 != other.dataField2)
return false;
return true;
}
}

The map holds unique keys. When you invoke put with a key that exists in the map, the object under that key is replaced with the new object. Hence the size 1.
The difference between the two should be obvious:
in a Map you store key-value pairs
in a Set you store only the keys
In fact, a HashSet has a HashMap field, and whenever add(obj) is invoked, the put method is invoked on the underlying map map.put(obj, DUMMY) - where the dummy object is a private static final Object DUMMY = new Object(). So the map is populated with your object as key, and a value that is of no interest.

A key in a Map can only map to a single value. So the second time you put in to the map with the same key, it overwrites the first entry.

In case of the HashSet, adding the same object will be more or less a no-op. In case of a HashMap, putting a new key,value pair with an existing key will overwrite the existing value to set a new value for that key. Below I've added equals() checks to your code:
SimpleObject simpleObject1 = new SimpleObject("Igor", 1);
SimpleObject simplObject2 = new SimpleObject("Igor", 1);
//If the below prints true, the 2nd add will not add anything
System.out.println("Are the objects equal? " , (simpleObject1.equals(simpleObject2));
testSet.add(simpleObject1);
testSet.add(simplObject2);
Integer key = new Integer(10);
//This is a no-brainer as you've the exact same key, but lets keep it consistent
//If this returns true, the 2nd put will overwrite the 1st key-value pair.
testMap.put(key, simpleObject1);
testMap.put(key, simplObject2);
System.out.println("Are the keys equal? ", (key.equals(key));
System.out.println(testSet.size());
System.out.println(testMap.size());

I just wanted to add to these great answers, the answer to your last dilemma. You wanted to know what is the difference between these two collections, if they are returning the same size after your insertion. Well, you can't really see the difference here, because you are inserting two values in the map with the same key, and hence changing the first value with the second. You would see the real difference (among the others) should you have inserted the same value in the map, but with the different key. Then, you would see that you can have duplicate values in the map, but you can't have duplicate keys, and in the set you can't have duplicate values. This is the main difference here.

Answer is simple because it is nature of HashSets.
HashSet uses internally HashMap with dummy object named PRESENT as value and KEY of this hashmap will be your object.
hash(simpleObject1) and hash(simplObject2) will return the same int. So?
When you add simpleObject1 to hashset it will put this to its internal hashmap with simpleObject1 as a key. Then when you add(simplObject2) you will get false because it is available in the internal hashmap already as key.
As a little extra info, HashSet use effectively hashing function to provide O(1) performance by using object's equals() and hashCode() contract. That's why hashset does not allow "null" which cannot be implemented equals() and hashCode() to non-object.

I think the major difference is,
HashSet is stable in the sense, it doesn't replace duplicate value (if found after inserting first unique key, just discard all future duplicates), and HashMap will make the effort to replace old with new duplicate value. So there must be overhead in HashMap of inserting new duplicate item.

public class HashSet<E>
extends AbstractSet<E>
implements Set<E>, Cloneable, Serializable
This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set
More Details

Compare new Integer Objects in ArrayList Question

I am storing Integer objects representing an index of objects I want to track. Later in my code I want to check to see if a particular object's index corresponds to one of those Integers I stored earlier. I am doing this by creating an ArrayList and creating a new Integer from the index of a for loop:
ArrayList<Integer> courseselectItems = new ArrayList();
//Find the course elements that are within a courseselect element and add their indicies to the ArrayList
for(int i=0; i<numberElementsInNodeList; i++) {
if (nodeList.item(i).getParentNode().getNodeName().equals("courseselect")) {
courseselectItems.add(new Integer(i));
}
}
I then want to check later if the ArrayList contains a particular index:
//Cycle through the namedNodeMap array to find each of the course codes
for(int i=0; i<numberElementsInNodeList; i++) {
if(!courseselectItems.contains(new Integer(i))) {
//Do Stuff
}
}
My question is, when I create a new Integer by using new Integer(i) will I be able to compare integers using ArrayList.contains()? That is to say, when I create a new object using new Integer(i), will that be the same as the previously created Integer object if the int value used to create them are the same?
I hope I didn't make this too unclear. Thanks for the help!

Yes, you can use List.contains() as that uses equals() and an Integer supports that when comparing to other Integers.
Also, because of auto-boxing you can simply write:
List<Integer> list = new ArrayList<Integer>();
...
if (list.contains(37)) { // auto-boxed to Integer
...
}
It's worth mentioning that:
List list = new ArrayList();
list.add(new Integer(37));
if (list.contains(new Long(37)) {
...
}
will always return false because an Integer is not a Long. This trips up most people at some point.
Lastly, try and make your variables that are Java Collections of the interface type not the concrete type so:
List<Integer> courseselectItems = new ArrayList();
not
ArrayList<Integer> courseselectItems = new ArrayList();

My question is, when I create a new Integer by using new Integer(i) will I be able to compare integers using ArrayList.contains()? That is to say, when I create a new object using new Integer(i), will that be the same as the previously created Integer object if the int value used to create them are the same?
The short answer is yes.
The long answer is ...
That is to say, when I create a new object using new Integer(i), will that be the same as the previously created Integer object if the int value used to create them are the same?
I assume you mean "... will that be the same instance as ..."? The answer to that is no - calling new will always create a distinct instance separate from the previous instance, even if the constructor parameters are identical.
However, despite having separate identity, these two objects will have equivalent value, i.e. calling .equals() between them will return true.
Collection.contains()
It turns out that having separate instances of equivalent value (.equals() returns true) is okay. The .contains() method is in the Collection interface. The Javadoc description for .contains() says:
http://java.sun.com/javase/6/docs/api/java/util/Collection.html#contains(java.lang.Object)
boolean contains(Object o)
Returns true if this collection
contains the specified element. More
formally, returns true if and only if
this collection contains at least one
element e such that (o==null ? e==null
: o.equals(e)).
Thus, it will do what you want.
Data Structure
You should also consider whether you have the right data structure.
Is the list solely about containment? is the order important? Do you care about duplicates? Since a list is order, using a list can imply that your code cares about ordering. Or that you need to maintain duplicates in the data structure.
However, if order is not important, if you don't want or won't have duplicates, and if you really only use this data structure to test whether contains a specific value, then you might want to consider whether you should be using a Set instead.

Short answer is yes, you should be able to do ArrayList.contains(new Integer(14)), for example, to see if 14 is in the list. The reason is that Integer overrides the equals method to compare itself correctly against other instances with the same value.

Yes it will, because List.contains() use the equals() method of the object to be compared. And Integer.equals() does compare the integer value.

As cletus and DJ mentioned, your approach will work.
I don't know the context of your code, but if you don't care about the particular indices, consider the following style also:
List<Node> courseSelectNodes = new ArrayList<Node>();
//Find the course elements that are within a courseselect element
//and add them to the ArrayList
for(Node node : numberElementsInNodeList) {
if (node.getParentNode().getNodeName().equals("courseselect")) {
courseSelectNodes.add(node);
}
}
// Do stuff with courseSelectNodes
for(Node node : courseSelectNodes) {
//Do Stuff
}

I'm putting my answer in the form of a (passing) test, as an example of how you might research this yourself. Not to discourage you from using SO - it's great - just to try to promote characterization tests.
import java.util.ArrayList;
import junit.framework.TestCase;
public class ContainsTest extends TestCase {
public void testContains() throws Exception {
ArrayList<Integer> list = new ArrayList<Integer>();
assertFalse(list.contains(new Integer(17)));
list.add(new Integer(17));
assertTrue(list.contains(new Integer(17)));
}
}

Yes, automatic boxing occurs but this results in a performance penalty. Its not clear from your example why you would want to solve the problem in this manner.
Also, because of boxing, creating the Integer class by hand is superfluous.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How Set checks for duplicates? Java HashSet - java

Related

Frequency of a value in a list of entities

Compare and match arrays with different sizes

HashMap with ArrayList key can not find it when Arraylist grows

Java HashSet vs HashMap

Compare new Integer Objects in ArrayList Question

Categories

Resources