Using own implementation of HashSet.add() - java

I am using a HashSet to add unique Users to the hashset. These users are updated frequently from an online source and when it reads it in from there it gets a new object each time so all the Users get added again.
I have a UserID variable in my user class and would like to use this to add users to the HashSet and ensure uniqueness.
private HashSet<User> userHashSet = new HashSet<User>();
userHashSet.add(user);
//based on the users id it will either add or not
I want to do this without losing efficiency and preferably only one HashSet

You should implement the equals and hashCode method in your User class.
As an example:
#Override
public int hashCode() {
return getUserId().hashCode();
}
#Override
public boolean equals(Object obj) {
return obj instanceof User &&
getUserId().equals(((User) obj).getUserId());
}

You need to implement the hashCode and equals methods in your User class. So long as you implement those two methods correctly then you will get unique entries in your HashSet.
There are many articles about how to implement these correctly (e.g. What issues should be considered when overriding equals and hashCode in Java?), but the easiest way to do it is to get your IDE to auto-generate those methods, or use the Apache Commons helpers.

Related

Should I compare all fields in my class's "equals" method?

I'm working on an application that allows the user to manage accounts. So, suppose I have an Account class, representing one of the user's accounts:
class Account
{
public int id;
public String accountName;
public String accountIdentifier;
public String server;
public String notes;
}
My equals method looks like this:
public boolean equals(Object o)
{
if (this == o)
return true;
if (o == null || !(o instanceof Account))
return false;
Account other = (Account) o;
if (!accountIdentifier.equals(other.accountIdentifier))
return false;
if (!server.equals(other.server))
return false;
return true;
}
As you can see, I'm only comparing the accountIdentifier and the server, but not the other fields. There are several reasons why I chose this approach.
I keep the accounts in a List. When the user updates an account, by changing the account name (which is just a name specified by the user to identify the account) or the notes, I can do accountList.set(accountList.indexOf(account), account); to update the account in the list. If equals compared all properties, this approach wouldn't work, and I'd have to work around it (for example by iterating over the list and checking for these properties manually).
This might actually be more important, but it only came to my mind after thinking about it for a while. An Account is uniquely identified by the accountIdentifier and the server it belongs to. The user might decide to rename the account, or change the notes, but it's still the same account. But if the server is changed, I think I would consider it a different account. The id is just an internal ID since the accounts are stored in a database. Even if that changed, the account is still considered the same account if the accountIdentifier and the server stayed the same.
What I'm trying to say is that I basically implemented equals this way to allow for shorter, more concise code in the rest of the application. But I'm not sure if I'm breaking some rules here, or if I'm doing something that might cause other developers headaches if it ever happens that someone is working with my application's API.
Is it okay to only compare some fields in the equals method, or should I compare all fields?
Yes, it's definitely okay to do this. You get to decide what equality means for your class, and you should use it in a way that makes the most sense for your application's logic — in particular, for collections and other such classes that make use of equality. It sounds like you have thought about that and decided that the (server, identifier) pair is what uniquely distinguishes instances.
This would mean, for instance, that two instances with the same (server, identifier) pair but a different accountName are different versions of the same Account, and that the difference might need to be resolved somehow; that's a perfectly reasonable semantic.
It may make sense to define a separate boolean allFieldsEqual(Account other) method to cover the "extended" definition, depending on whether you need it (or would find it useful for testing).
And, of course, you should override hashCode to make it consistent with whatever definition of equals you go with.
You should compare all of the fields that are necessary to determine equality. If the accountIdentifier and server fields are enough to determine if two objects are equal, then that is perfectly fine. No need to include any of the other fields that don't matter in terms of equality.
For the key normally you should use the business key, this key can be simple or composite key and not necessary need to include all the fields in the entity. So... depends of each case to select what identify an entity. If possible should be the minimum number of field fully and unique identify the entity.
Some people prefer (and is a good practice) to create a surrogate key that will identity the object, this is very useful when you want to persist your objects using any ORM due you don’t need to export the keys to the child entities in 1:M or M:N relations. For example the ID in your sample can be considered as surrogate key if you create it as internal unique identifier.
Also may want to take into consideration:
Always you override equals you must override hashCode too, this is important to work properly with classes like Collections, Maps etc
Apache provide a really nice API to help in the implementation of equals and hashCode. Those classes are EqualsBuilder and HashCodeBuilder. Both allow you to concatenate the fields you want to use in your comparison and have a way also to use reflection.
The answer is "it depends depends on the semantics of your data".
For example, you might internally store a field that can be derived (calculated) from the other fields. In which case, you don't need to compare the calculated value.
As a gross generalisation, anything that cannot be derived from other fields should be included.
This is fine - and probably a good thing to do. If you've identified equality as the accountIdentifier and the server being distinct and unique, then that's perfectly valid for your use case.
You don't want to use more fields than you need to since that would produce false positives in your code. This approach is perfectly suitable to your needs.

RMI - references and using remote object as a "key" in a hashmap

I have an RMI server which exports objects which implement an interface, let's call it "TestInterface".
The client can then do various tasks, calling methods on this interface. I then have some other interface for a different type of object, for which one of the methods takes in a object of type "TestInterface". The method tries to use a hashmap where the "TestInterface" is the key (it stores a hashmap of all "TestInterface" objects on the server, the value being what I would like to retrieve), but fails to find it. I know this key exists within the hashmap, but due to the interface being passed to the client, and then back to the server later it seems to not find it.
I know it exists as I am able to print out the IDs of the objects which are keys in the hashmap, and the ID of the "TestInterface" passed in (they are the same). I then printed them both out (using the default toString), and I get this:
Proxy[TestInterface,RemoteObjectInvocationHandler[UnicastRef [liveRef: [endpoint:[127.0.0.1:1099](remote),objID:[-38c943f9:14be569cb95:-7ff9, 8183237023004091504]]]]]
Test[UnicastServerRef [liveRef: [endpoint:[127.0.0.1:1099](local),objID:[-38c943f9:14be569cb95:-7ff9, 8183237023004091504]]]]
I can see these are different, which is probably why I cannot look them up. However, they look to be pointing to the same object (I believe this is the case as the objectId is the same).
My question is, why can I not look up this "Key" in the hashmap, where the Key is a reference to the same object.
My second question is, is there an easy way to fix this?
This is occurring because the actual object which you are using as the key does not evaluate the same with hashCode() and equals(). Those are the methods the HashMap is using for the lookup. A common way to handle this is to create a wrapper class which will use the internal objId inside the hashCode() and equals(). This wrapper class is then used as the key to the map.
Something like:
public Wrapper {
private Keyclass key;
...
public Keyclass getWrapped() { return key; }
#Override
public int hashCode() {
return key.objID.hashCode();
}
#Override
public boolean equals(Object o) {
return key.objID.equals(((Wrapped)o).key.objID);
}
}

Method "contains".Java List

I have Spring MVC appication. I have entity class Examination. I overridden method eqauls so I could use method contains of List interface. When i try to add new exam, I look if i have already added it. But when i pass examination object to mathod contains, I always have different students. For example:
I need to add exam to Student Jone. I try to add it and get another information: Kate : Jone, instead of Jone : Jone. I do not know why i happens because i pass examination object when i set student as Jone.
#Override
public boolean equals(Object arg) {
Examination exam = (Examination) arg;
System.out.println(exam.getStudent().getStudentFullName() + ":" + this.getStudent().getStudentFullName());
if (!this.subject.getSubjectTitle().equals(exam.getSubject().getSubjectTitle()))
return false;
else
return true;
}
piece of code where i try to add exam
examination.setStudent(currentStudent); // set student
examination.setSubject(subjectExam); // set subject
if(es.selectAllExams().contains(examination)) {
return "error";
} else {
es.insertExam(examination); // add to database
return "success";
}
In the equals method you are comparing only titles, not the student name. So if you have two examinations with same title, but different student name they are equal (based on your equals method). Compare also students in the equals method and you should be good. In general it is good practice to override both equals and hashcode methods.
Your implementation of equals method is in general not following best practices for overriding equals method. Google a bit for "java equals method best practices" - you'll find something like this : http://javarevisited.blogspot.sk/2011/02/how-to-write-equals-method-in-java.html
If you are lazy to write your own equals or hashcode methods (or you have other reasons) you can use :
http://commons.apache.org/proper/commons-lang/javadocs/api-2.6/org/apache/commons/lang/builder/EqualsBuilder.html
or
http://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/builder/HashCodeBuilder.html
You say you override equals()
so I could use method contains of List interface
However, you don't have to override that method to use contains(). There's a default implementation that's suitable for most purposes; it boils down to "are these objects the same instance?". As a previous responder pointed out, you're breaking this logic with your implementation; all examinations with the same title will be considered the same object, so as long as your list has one examination with the same title as the one you're trying to add, the contains() check will always return true, and you'll never be able to add another one.
If you do want equality to be based on the title and student in question, then the previous answer is correct - you'll want to override both hashCode() and equals(), making sure you consider all fields important to an examination's identity in both methods.

Should #Transient property be used in equals/hashCode/toString?

I have JPA entities where some properties are annotated with #Transient.
Should I use these properties in equals/hashCode/toString methods?
My first thought is NO but I don't know why.
Tips?
Ideas?
Explanations?
The case of toString() is different, you can do whatever you want with toString() so I will only cover equals() (and hashCode()).
First, the rule: if you want to store an object in a List, Map or a Set then it is a requirement that equals and hashCode are implemented so they obey the standard contract as specified in the documentation.
Now, how to implement equals() and hashCode()? A "natural" idea would be to use the properties mapped as Id as part of the equals():
public class User {
...
public boolean equals(Object other) {
if (this==other) return true;
if (id==null) return false;
if ( !(other instanceof User) ) return false;
final User that = (User) other;
return this.id.equals( that.getId() );
}
public int hashCode() {
return id==null ? System.identityHashCode(this) : id.hashCode();
}
}
Unfortunately, this solution has a major problem: when using generated identifiers, the values are not assigned until an entity becomes persistent so if a transient entity is added to a Set before being saved, its hash code will change while it's in the Set and this breaks the contract of the Set.
The recommended approach is thus to use the attributes that are part of the business key i.e. a combination of attributes that is unique for each instance with the same database identity. For example, for the User class, this could be the username:
public class User {
...
public boolean equals(Object other) {
if (this==other) return true;
if ( !(other instanceof User) ) return false;
final User that = (User) other;
return this.username.equals( that.getUsername() );
}
public int hashCode() {
return username.hashCode();
}
}
The Hibernate Reference Documentation summarizes this as follow:
"Never use the database identifier to implement equality; use a business key, a combination of unique, usually immutable, attributes. The database identifier will change if a transient object is made persistent. If the transient instance (usually together with detached instances) is held in a Set, changing the hashcode breaks the contract of the Set. Attributes for business keys don't have to be as stable as database primary keys, you only have to guarantee stability as long as the objects are in the same Set." - 12.1.3. Considering object identity
"It is recommended that you implement equals() and hashCode() using Business key equality. Business key equality means that the equals() method compares only the properties that form the business key. It is a key that would identify our instance in the real world (a natural candidate key)" - 4.3. Implementing equals() and hashCode()
So, back to the initial question:
Use a business key if possible. #Transient attributes are very likely not part of such a key.
If not possible, use identifier properties but make sure to get the values assigned before to add an entity to a List, Map, Set.
See also
Equals and HashCode
Don't Let Hibernate Steal Your Identity
equals and hashcode in Hibernate
Understanding equals() and hashCode() (p. 396) in Java Persistence with Hibernate
The two typical usages of #Transient and transient that I'm aware of, are to use them either for stuff that can't be serialized/persisted (e.g. a remote resource handle) or computed properties which can be reconstructed from others.
For computed data, it makes no sense to use them in the equality relationship (equals/hashCode), because it would be redundant. The value is computed out of other value which are already used in the equality. It can however still makes sense to print them in toString (e.g. a base price and a ratio are used to compute the actual price).
For not serializable/persitable data, it depends. I can imagine a handle to a resource that is not serializable, but you can still compare the resource name that the handle represent. Same for toString, maybe printing the handle resource name is useful.
This was my 2 cent, but if you explain your particular usage of #Transient, someone can maybe give a better advice.
Exception maybe comes from letting it be transient and at the same time you provide writeObject() and readObject() where you process it.

Object equality in context of hibernate / webapp

How do you handle object equality for java objects managed by hibernate? In the 'hibernate in action' book they say that one should favor business keys over surrogate keys.
Most of the time, i do not have a business key. Think of addresses mapped to a person. The addresses are keeped in a Set and displayed in a Wicket RefreshingView (with a ReuseIfEquals strategy).
I could either use the surrogate id or use all fields in the equals() and hashCode() functions.
The problem is that those fields change during the lifetime ob the object. Either because the user entered some data or the id changes due to JPA merge() being called inside the OSIV (Open Session in View) filter.
My understanding of the equals() and hashCode() contract is that those should not change during the lifetime of an object.
What i have tried so far:
equals() based on hashCode() which uses the database id (or super.hashCode() if id is null). Problem: new addresses start with an null id but get an id when attached to a person and this person gets merged() (re-attached) in the osiv-filter.
lazy compute the hashcode when hashCode() is first called and make that hashcode #Transitional. Does not work, as merge() returns a new object and the hashcode does not get copied over.
What i would need is an ID that gets assigned during object creation I think. What would be my options here? I don't want to introduce some additional persistent property. Is there a way to explicitly tell JPA to assign an ID to an object?
Regards
Using the id of an entity is not a good idea because transient entities don't have an id yet (and you still want a transient entity to be potentially equal to a persistent one).
Using all properties (apart from the database identifier) is also not a good idea because all properties are just not part of the identity.
So, the preferred (and correct) way to implement equality is to use a business key, as explained in Java Persistence with Hibernate:
Implementing equality with a business key
To get to the solution that we recommend, you need to understand the notion of
a business key. A business key is a property, or some combination of properties, that
is unique for each instance with the same database identity. Essentially, it’s the natural key that you would use if you weren’t using a surrogate primary key instead.
Unlike a natural primary key, it isn’t an absolute requirement that the business
key never changes—as long as it changes rarely, that’s enough.
We argue that essentially every entity class should have some business key, even
if it includes all properties of the class (this would be appropriate for some
immutable classes). The business key is what the user thinks of as uniquely identifying a particular record, whereas the surrogate key is what the application and
database use.
Business key equality means that the equals() method compares only the properties that form the business key. This is a perfect solution that avoids all the problems described earlier. The only downside is that it requires extra thought to
identify the correct business key in the first place. This effort is required anyway;
it’s important to identify any unique keys if your database must ensure data integrity via constraint checking.
For the User class, username is a great candidate business key. It’s never null,
it’s unique with a database constraint, and it changes rarely, if ever:
public class User {
...
public boolean equals(Object other) {
if (this==other) return true;
if ( !(other instanceof User) ) return false;
final User that = (User) other;
return this.username.equals( that.getUsername() );
}
public int hashCode() {
return username.hashCode();
}
}
Maybe I missed something but for an Address, the business key would typically be made of the street number, the street, the city, the postal code, the country. I don't see any problem with that.
Just in case, Equals And HashCode is another interesting reading.
Maybe a transient property would do it?
That way you don't have to worry about the persistence. Like this:
#Transient
private Integer otherId;
I use to do it that way: equal and hashcode use the key when it has been set, otherwise equals uses the base implementation (aka ==). It should work too if hashcode() returns super.hashcode() instead of 0.
#Override
public int hashCode() {
if (code == null) {
return 0;
} else {
return code.hashCode();
}
}
#Override
public boolean equals(Object obj) {
if (obj instanceof PersistentObject && Hibernate.getClass(obj).equals(Hibernate.getClass(this))) {
PersistentObject po = (PersistentObject) obj;
if (code == null) {
return po.code == null && this == po;
} else {
return code.equals(po.getCode());
}
} else {
return super.equals(obj);
}
}
The question is how often are you likely to have multiple unsaved objects that might be duplicates that need to go into a set or map? For me, the answer is virtually never so I use surrogate keys and super.equals/hashcode for unsaved objects.
Business keys make sense in some cases, but they can cause problems. For example, what if two people live at the same address - if you want that to be one record in the database, then you have to manage it as a many-to-many and lose the ability to cascade delete it so when the last person living there is deleted, you have to do extra work to get rid of the address. But if you store the same addresss for each person then your business key has to include the person entity, which may mean a database hit inside your equals/hashcode methods.
Thanks for all your input. I decided to use surrogate keys and provide those right at object creation time. This way i stay clear of all that 'rarely' changing stuff and have something solid to base identity on. First tests look rather good.
thank you all for your time. Unfortunately, i can only accept one answer as solution i will take Pascals, as he provided me with good reading ;)
enjoy

Categories

Resources