I have a particular requirement where I need to dedupe a list of objects based on a combination of equality criteria.
e.g. Two Student objects are equal if:
1. firstName and id are same OR 2. lastName, class, and emailId are same
I was planning to use a Set to remove duplicates. However, there's a problem:
I can override the equals method but the hashCode method may not return same hash code for two equal objects.
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Student other = (Student) obj;
if ((firstName.equals(other.firstName) && id==other.id) ||
(lastName.equals(other.lastName) && class==other.class && emailId.equals(other.emailId ))
return true;
return false;
}
Now I cannot override hashCode method in a way that it returns same hash codes for two objects that are equal according to this equals method.
Is there a way to dedupe based on multiple equality criteria? I considered using a List and then using the contains method to check if the element is already there, but this increases the complexity as contains runs in O(n) time. I don't want to return the exact same hash codes for all the objects as that's just increases the time and beats the purpose of using hash codes. I've also considered sorting items using a custom comparator, but that again takes at least O(n log n), plus one more walk through to remove the duplicates.
As of now, the best solution I have is to maintain two different sets, one for each condition and use that to build a List, but that takes almost three times the memory. I'm looking for a faster and memory efficient way as I'll be dealing with a large number of records.
You can make Student Comparable and use TreeSet. Simple implementation of compareTo may be:
#Override
public int compareTo(Student other) {
if (this.equals(other)) {
return 0;
} else {
return (this.firstName + this.lastName + emailId + clazz + id)
.compareTo(other.firstName + other.lastName + other.emailId + clazz + id);
}
}
Or make your own Set implementation, for instance containing a List of distinct Student objects, checking for equality every time you add a student. This will have O(n) add complexity, so can't be considered a good implementation, but it is simple to write.
class ListSet<T> extends AbstractSet<T> {
private List<T> list = new ArrayList<T>();
#Override
public boolean add(T t) {
if (list.contains(t)) {
return false;
} else {
return list.add(t);
}
}
#Override
public Iterator<T> iterator() {
return list.iterator();
}
#Override
public int size() {
return list.size();
}
}
Related
I have a class Product, which three variables:
class Product implements Comparable<Product>{
private Type type; // Type is an enum
Set<Attribute> attributes; // Attribute is a regular class
ProductName name; // ProductName is another enum
}
I used Eclipse to automatically generate the equal() and hashcode() methods:
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((attributes == null) ? 0 : attributes.hashCode());
result = prime * result + ((type == null) ? 0 : type.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
Product other = (Product) obj;
if (attributes == null) {
if (other.attributes != null)
return false;
} else if (!attributes.equals(other.attributes))
return false;
if (type != other.type)
return false;
return true;
}
Now in my application I need to sort a Set of Product, so I need to implement the Comparable interface and compareTo method:
#Override
public int compareTo(Product other){
int diff = type.hashCode() - other.getType().hashCode();
if (diff > 0) {
return 1;
} else if (diff < 0) {
return -1;
}
diff = attributes.hashCode() - other.getAttributes().hashCode();
if (diff > 0) {
return 1;
} else if (diff < 0) {
return -1;
}
return 0;
}
Does this implementation make sense? What about if I just want to sort the product based on the String values of "type" and "attributes" values. So how to implement this?
Edit:
The reason I want to sort a Set of is because I have Junit test which asserts on the string values of a HashSet. My goal is to maintain the same order of output as I sort the set. otherwise, even if the Set's values are the same, the assertion will fail due to random output of a set.
Edit2:
Through the discussion, it's clear that to assert the equality of String values of a HashSet isn't good in unit tests. For my situation I currently write a sort() function to sort the HashSet String values in natural ordering, so it can consistently output the same String value for my unit tests and that suffice for now. Thanks all.
Looks like from all the comments in here you dont need to use Comparator at all. Because:
1) You are using HashSet that does not work with Comparator. It is not ordered.
2) You just need to make sure that two HashSets containing Products are equal. It means they are same size and contain the same set of Products.
Since you already added hashCode and equals methods to Product all you need to do is call equals method on those HashSets.
HashSet<Product> set1 = ...
HashSet<Product> set2 = ...
assertTrue( set1.equals(set2) );
This implementation does not seem to be consistent. You have no control over how the hash codes look like. If you have obj1 < obj2 according to compareTo in the first try, the next time you start your JVM it could be the other way around obj1 > obj2.
The only thing that you really know is that if diff == 0 then the objects are considered to be equal. However you can also just use the equals method for that check.
It is now up to you how you define when obj1 < obj2 or obj1 > obj2. Just make sure that it is consistent.
By the way, you know that the current implementation does not include ProductName name in the equals check? Dont know if that is intended thus the remark.
The question is, what do you know about that attributes? Maybe they implement Comparable (for example if they are Numbers), then you can order according to their compareTo method. If you totally know nothing about the objects, it will be hard to build up a consistent ordering.
If you just want them to be ordered consistently but the ordering itself does not play any role, you could just give them ids at creation time and sort by them. At this point you could indeed use the hashcodes if it does not matter that it can change between JVM calls, but only then.
I am trying to test a class for a test-assignment poker-game in which it is only important to determine the validity or value of particular hands.
My PokerHand object contains a TreeSet<Card>. I thought this would be an ideal data-structure since doubles are not allowed, and it automagically sorts it with red-black tree algorithm.
The problem however, is that it appears to have some side-effects that I am not yet aware of. I understand that doubles will not be added to a TreeSet, but in my tests I make sure not to. Instead I noticed that it will not add new Card objects to the TreeSet as soon as the number fields are equal, but not the type.
This is the equals method for a Card
#Override
public boolean equals(Object obj) {
if (obj == null) {
return false;
}
if (getClass() != obj.getClass()) {
return false;
}
final Card other = (Card) obj;
return this.type == other.type && this.number == other.number;
}
This is the test, adding various cards...
#Test
public void testOnePair() {
hand.addCard(new Card(3, Card.CARD_TYPE.SPADES));
hand.addCard(new Card(8, Card.CARD_TYPE.CLUBS));
hand.addCard(new Card(10, Card.CARD_TYPE.HEARTS));
hand.addCard(new Card(14, Card.CARD_TYPE.SPADES));
hand.addCard(new Card(14, Card.CARD_TYPE.CLUBS));
assertEquals("One Pair", this.hand.getValue());
}
What appears to be happening is that the last Card is not added, so the size of the TreeSet effectively remains 4, even though the cards are clearly distinct. It does not even consult the equals method.
It does however reach the compareTo method.
#Override
public int compareTo(Object t) {
if (t.getClass().equals(this.getClass())) {
Card otherCard = (Card)t;
if (otherCard.equals(this)) {
return 0;
}
return this.number - otherCard.number;
}
else {
throw new ClassCastException("Cannot convert " + t.getClass().toString() + " to Card");
}
}
It has been a while since I've gotten back into Java 8 and maybe I'm just clearly overseeing something. I hope somebody can help me forward with this.
I've always been reluctant to ask questions here. Solved this as soon as I submitted it... Wanted to share this with you. TreeSet only cares about the compareTo method. So I changed it to be the following.
#Override
public int compareTo(Object t) {
if (t.getClass().equals(this.getClass())) {
Card otherCard = (Card)t;
if (this.number == otherCard.number) return this.type.compareTo(otherCard.type);
return this.number - otherCard.number;
}
else {
throw new ClassCastException("Cannot convert " + t.getClass().toString() + " to Card");
}
}
This solved it, because now the comparable contract is "aware" of the type properties.
i have a problem with the contains() method of TreeSet. As I understand it, contains() should call equals() of the contained Objects as the javadoc says:
boolean java.util.TreeSet.contains(Object o): Returns true if this set
contains the specified element. More formally, returns true if and
only if this set contains an element e such that (o==null ? e==null :
o.equals(e)).
What I try to do:
I have a list of TreeSets with Result Objects that have a member String baseword. Now I want to compare each TreeSet with all Others, and make for each pair a list of basewords they share. For this, I iterate over the list once for a treeSet1 and a second time for a treeSet2, then I iterate over all ResultObjects in treeSet2 and run treeSet1.contains(ResultObject) for each, to see if treeSet1 contains a Result Object with this wordbase. I adjusted the compareTo and equals methods of the ResultObject. But it seems that my equals is never called.
Can anyone explain me why this doesn't work?
Greetings,
Daniel
public static void getIntersection(ArrayList<TreeSet<Result>> list, int value){
for (TreeSet<Result> treeSet : list){
//for each treeSet, we iterate again through the list of TreeSet, starting at the TreeSet that is next
//to the one we got in the outer loop
for (TreeSet<Result> treeSet2 : list.subList((list.indexOf(treeSet))+1, list.size())){
//so at this point, we got 2 different TreeSets
HashSet<String> intersection = new HashSet<String>();
for (Result result : treeSet){
//we iterate over each result in the first treeSet and see if the wordbase exists also in the second one
//!!!
if (treeSet2.contains(result)){
intersection.add(result.wordbase);
}
}
if (!intersection.isEmpty()){
intersections.add(intersection);
}
}
}
public class Result implements Comparable<Result>{
public Result(String wordbase, double result[]){
this.result = result;
this.wordbase = wordbase;
}
public String wordbase;
public double[] result;
public int compareTo(DifferenceAnalysisResult o) {
if (o == null) return 0;
return this.wordbase.compareTo(o.wordbase);
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result
+ ((wordbase == null) ? 0 : wordbase.hashCode());
return result;
}
//never called
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (obj == null)
return false;
if (getClass() != obj.getClass())
return false;
DifferenceAnalysisResult other = (DifferenceAnalysisResult) obj;
if (wordbase == null) {
if (other.wordbase != null)
return false;
} else if (!wordbase.equals(other.wordbase))
return false;
return true;
}
}
As I understand it, contains() should call equals() of the contained Objects
Not for TreeSet, no. It calls compare:
A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
...
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface.
Your compareTo method isn't currently consistent with equals - x.compareTo(null) returns 0, whereas x.equals(null) returns false. Maybe you're okay with that, but you shouldn't expect equals to be called.
I was asked this in interview. using Google Guava or MultiMap is not an option.
I have a class
public class Alpha
{
String company;
int local;
String title;
}
I have many instances of this class (in order of millions). I need to process them and at the end find the unique ones and their duplicates.
e.g.
instance --> instance1, instance5, instance7 (instance1 has instance5 and instance7 as duplicates)
instance2 --> instance2 (no duplicates for instance 2)
My code works fine
declare datastructure
HashMap<Alpha,ArrayList<Alpha>> hashmap = new HashMap<Alpha,ArrayList<Alpha>>();
Add instances
for (Alpha x : arr)
{
ArrayList<Alpha> list = hashmap.get(x); ///<<<<---- doubt about this. comment#1
if (list == null)
{
list = new ArrayList<Alpha>();
hashmap.put(x, list);
}
list.add(x);
}
Print instances and their duplicates.
for (Alpha x : hashmap.keySet())
{
ArrayList<Alpha> list = hashmap.get(x); //<<< doubt about this. comment#2
System.out.println(x + "<---->");
for(Alpha y : list)
{
System.out.print(y);
}
System.out.println();
}
Question: My code works, but why? when I do hashmap.get(x); (comment#1 in code). it is possible that two different instances might have same hashcode. In that case, I will add 2 different objects to the same List.
When I retrieve, I should get a List which has 2 different instances. (comment#2) and when I iterate over the list, I should see at least one instance which is not duplicate of the key but still exists in the list. I don't. Why?. I tried returning constant value from my hashCode function, it works fine.
If you want to see my implementation of equals and hashCode,let me know.
Bonus question: Any way to optimize it?
Edit:
#Override
public boolean equals(Object obj) {
if (obj==null || obj.getClass()!=this.getClass())
return false;
if (obj==this)
return true;
Alpha guest = (Alpha)obj;
return guest.getLocal()==this.getLocal()
&& guest.getCompany() == this.getCompany()
&& guest.getTitle() == this.getTitle();
}
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + (title==null?0:title.hashCode());
result = prime * result + local;
result = prime * result + (company==null?0:company.hashCode());
return result;
}
it is possible that two different instances might have same hashcode
Yes, but hashCode method is used to identify the index to store the element. Two or more keys could have the same hashCode but that's why they are also evaluated using equals.
From Map#containsKey javadoc:
Returns true if this map contains a mapping for the specified key. More formally, returns true if and only if this map contains a mapping for a key k such that (key==null ? k==null : key.equals(k)). (There can be at most one such mapping.)
Some enhancements to your current code:
Code oriented to interfaces. Use Map and instantiate it by HashMap. Similar to List and ArrayList.
Compare Strings and Objects in general using equals method. == compares references, equals compares the data stored in the Object depending the implementation of this method. So, change the code in Alpha#equals:
public boolean equals(Object obj) {
if (obj==null || obj.getClass()!=this.getClass())
return false;
if (obj==this)
return true;
Alpha guest = (Alpha)obj;
return guest.getLocal().equals(this.getLocal())
&& guest.getCompany().equals(this.getCompany())
&& guest.getTitle().equals(this.getTitle());
}
When navigating through all the elements of a map in pairs, use Map#entrySet instead, you can save the time used by Map#get (since it is supposed to be O(1) you won't save that much but it is better):
for (Map.Entry<Alpha, List<Alpha>> entry : hashmap.keySet()) {
List<Alpha> list = entry.getValuee();
System.out.println(entry.getKey() + "<---->");
for(Alpha y : list) {
System.out.print(y);
}
System.out.println();
}
Use equals along with hashCode to solve the collision state.
Steps:
First compare on the basis of title in hashCode()
If the title is same then look into equals() based on company name to resolve the collision state.
Sample code
class Alpha {
String company;
int local;
String title;
public Alpha(String company, int local, String title) {
this.company = company;
this.local = local;
this.title = title;
}
#Override
public int hashCode() {
return title.hashCode();
}
#Override
public boolean equals(Object obj) {
if (obj instanceof Alpha) {
return this.company.equals(((Alpha) obj).company);
}
return false;
}
}
...
Map<Alpha, ArrayList<Alpha>> hashmap = new HashMap<Alpha, ArrayList<Alpha>>();
hashmap.put(new Alpha("a", 1, "t1"), new ArrayList<Alpha>());
hashmap.put(new Alpha("b", 2, "t1"), new ArrayList<Alpha>());
hashmap.put(new Alpha("a", 3, "t1"), new ArrayList<Alpha>());
System.out.println("Size : "+hashmap.size());
Output
Size : 2
If I have a map and an object as map key, are the default hash and equals methods enough?
class EventInfo{
private String name;
private Map<String, Integer> info
}
Then I want to create a map:
Map<EventInfo, String> map = new HashMap<EventInfo, String>();
Do I have to explicitly implement hashCode() and equals()? Thanks.
Yes, you do. HashMaps work by computing the hash code of the key and using that as a base point. If the hashCode function isn't overriden (by you), then it will use the memory address, and equals will be the same as ==.
If you're in Eclipse, it'll generate them for you. Click Source menu → Generate hashCode() and equals().
If you don't have Eclipse, here's some that should work. (I generated these in Eclipse, as described above.)
#Override
public int hashCode() {
final int prime = 31;
int result = 1;
result = prime * result + ((info == null) ? 0 : info.hashCode());
result = prime * result + ((name == null) ? 0 : name.hashCode());
return result;
}
#Override
public boolean equals(Object obj) {
if (this == obj) {
return true;
}
if (obj == null) {
return false;
}
if (!(obj instanceof EventInfo)) {
return false;
}
EventInfo other = (EventInfo) obj;
if (info == null) {
if (other.info != null) {
return false;
}
} else if (!info.equals(other.info)) {
return false;
}
if (name == null) {
if (other.name != null) {
return false;
}
} else if (!name.equals(other.name)) {
return false;
}
return true;
}
Yes, you need them else you won't be able to compare two EventInfo (and your map won't work).
Strictly speaking, no. The default implementations of hashCode() and equals() will produce results that ought to work. See http://docs.oracle.com/javase/6/docs/api/java/lang/Object.html#hashCode()
My understanding is that the default implementation of hashCode() works by taking the object's address in memory and converting to integer, and the default implementation of equals() returns true only if the two objects are actually the same object.
In practice, you could (and should) probably improve on both of those implementations. For example, both methods should ignore object members that aren't important. In addition, equals() might want to recursively compare references in the object.
In your particular case, you might define equals() as true if the two objects refer to the same string or the two strings are equal and the two maps are the same or they are equal. I think WChargin gave you pretty good implementations.
Depends on what you want to happen. If two different EventInfo instances with the same name and info should result in two different keys, then you don't need to implement equals and hashCode.
So
EventInfo info1 = new EventInfo();
info1.setName("myname");
info1.setInfo(null);
EventInfo info2 = new EventInfo();
info2.setName("myname");
info2.setInfo(null);
info1.equals(info2) would return false and info1.hashCode() would return a different value to info2.hashCode().
Therefore, when you are adding them to your map:
map.put(info1, "test1");
map.put(info2, "test2");
you would have two different entries.
Now, that may be desired behaviour. For example, if your EventInfo is collecting different events, two distinct events with the same data may well want to be desired to be two different entries.
The equals and hashCode contracts is also applicable in a Set.
So for example, if your event info contains mouse clicks, it may well be desired that you would want to end up with:
Set<EventInfo> collectedEvents = new HashSet<EventInfo>();
collectedEvents.add(info1);
collectedEvents.add(info2);
2 collected events instead of just 1...
Hope I'm making sense here...
EDIT:
If however, the above set and map should only contain a single entry, then you could use apache commons EqualsBuilder and HashCodeBuilder to simplify the implementation of equals and hashCode:
#Override
public boolean equals(Object obj) {
if (obj instanceof EventInfo) {
EventInfo other = (EventInfo) obj;
EqualsBuilder builder = new EqualsBuilder();
builder.append(name, other.name);
builder.append(info, other.info);
return builder.isEquals();
}
return false;
}
#Override
public int hashCode() {
HashCodeBuilder builder = new HashCodeBuilder();
builder.append(name);
builder.append(info);
return builder.toHashCode();
}
EDIT2:
It could also be appropriate if two EventInfo instances are considered the same, if they have the same name, for example if the name is some unique identifier (I know it's a bit far fetched with your specific object, but I'm generalising here...)