equals and hashCode with many fields in Java? [closed]

equals and hashCode with many fields in Java? [closed] - java

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
In Java apps, I prefer to use unique fields in equals and hashCode methods instead of adding only id field or all the fields. However, I am confused about the following points:
By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except from id field)?
Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only add some numeric field instead of adding text fields?

JPA and Hibernate don't specify or rely on any particular semantics for entities' equals() and hashCode() methods, so you can do what you want.
Good alternatives
With that said, there is a handful of alternatives for equality that make much more sense to me than any others:
Equality corresponds to object identity. This is of course the default provided by Object.equals(), and it can serve perfectly well for entities. OR
Equality corresponds to persistent identity. That is, entities are equal if and only if they have the same entity type and primary key. OR
Equality corresponds to (only) value equality. That is, equality of all corresponding persistent fields except the ID. There are additional variations around how that applies to mapped relationships. OR
Equality corresponds to persistent identity AND value equality. Again, there are variations around how the value equality part applies to mapped relationships.
General advice
In general, you would do well to follow a fule rules of thumb:
As with most other classes, especially mutable ones, default to just inheriting Object.equals() and Object.hashCode(). Have a specific purpose and plan before you do otherwise, and remember that you get only one choice for this. And that it is impactful.
If you do override equals() (and therefore hashCode() as well) then do it in a consistent way across all your entities.
Think carefully before you go with an option involving value equality. This is usually a poor choice for mutable classes in general, and entities are no exception.
Specific Questions
1. By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
I think using the ID is fine. It's simply a question of what you want equality to represent for your entities. You absolutely can have distinct entity objects with the same type and ID, and you might want to be able to detect that with equals(). The other persistent fields might or might not factor into that.
In particular, an equals() method based solely on entity ID might make sense for entities that appear on the "many" side of a one-to-many relationship when that is mapped to a Set.
2. When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except
from id field)?
I see no good reason to consider only a proper subset of unique fields, except that subset consisting only of the entity ID. Or if all the fields are unique then the one consisting of all the fields except the ID. The logic that suggests that you might be able to consider other proper subsets revolves around the persistent identity of the entity, which is completely and best represented by its ID.
3. Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only
add some numeric field instead of adding text fields?
If your sense of equality is to be based on entity value then I don't see how it makes much sense to omit any persistent fields except, possibly, the ID. Do not arbitrarily omit the ID -- it may very well be something you want to include. Again, it depends on what equals() is intended to mean for your entities.

That's a tricky question that hibernate itself doesn't have a clear answer on.
John Bollinger's answer covers your specific question, but there is some additional context about how to think about equality and hibernate that should help figure out what to do. After all, given that hibernate doesn't require you to do anything particular, you can do whatever you want, which leads to the obvious question: ... okay, so what should I do, then?
That question boils down to (using Person as an arbitrary example of a model class + associated table; furthermore, lets say the person table has a single unique ID that is generated (A random UUID or auto-sequenced integer value).
What does an instance of Person represent?
There are in broad strokes 2 answers:
It represents a person. A row in the person table also represents a person; these 2 things aren't related.
It represents a row in the person table.
It represents a state in my application, nothing more.
Even though these things sound quite similar, they result in opposite meanings as to equality.
Which choice is correct? That's up to you.
When reading on, remember:
Any Person instance which isn't "saved" yet, would have a null value for id, because upon insertion, hibernate will ask the DB to generate a value for it or generates one itself and only then fills it in.
An instance represents a row
Equality under the second model (an instance of Person represents a row in the table) should look only at the id column, because that defines row uniqueness; any 2 representations of a row in the person table are guaranteed to be referring to the same row (hence, equal) if and only if the id is equal. That is a necessary and sufficient condition: If they are equal the 2 objects are necessarily referring to the same row, and if they aren't equal, then they are necessarily referring to different rows.
Notably, if id is still null, then they cannot be equal, not even to themselves: More generally the question: "Is this object-representing-a-row equal to this other object-representing-a-row" is a meaningless question if these objects are representing rows-to-be (unsaved rows). If you invoke save() on each object, you end up with 2 rows. Optimally such an object should be considered in a state such that attempting to invoke equals on it is a failure, but the spec of equals states that they can't throw, therefore, false is the best answer. This would mean you want:
class Person {
// fields
#Override public boolean equals(Object other) {
if (other == this) return true;
if (other == null || other.getClass() != Person.class) return false;
UUID otherId = ((Person) other).id;
return id == null ? false : id.equals(otherId);
}
}
This defines your equals method as 'ends up representing the same row'. This holds even if you change meaningful state:
Change the name and save the object? It's... still the same row, and this equality implementation reflects this.
Call save() on each in the comparison when they were unsaved? Then you get 2 rows - and this equality implementation reflects this before and after attempting to save it.
If invoking on self (a.equals(a)) this returns true as the equality spec demands; it also works out in the 'modelling a row' view: If you invoke save() on the same object twice, it's still just one row.
An instance represents a person
The nature of what a person is is entirely unrelated to the autosequence/autogen ID it gets; the fact that we're using hibernate is an implementation detail that should play no part at all in considering equality; after all, this object represents the notion of a person, and that notion exists entirely independent of the database. The database is one thing that is modelling persons; instances of this class are another.
In this model you should do the exact opposite: Find something that uniquely identifies a person itself, and compare against that. After all, if you have 2 rows in a database that both contain the same social security number, then you have only 1 person.. and you just happen to have 2 rows that are both referring to the same person. Given that we chose our instance to imply that it represents a person, then an instance loaded from row A, and an instance loaded from row B, ought to be considered as equal - after all, they are representing the same individual.
In this case, you write an equals method that considers all relevant fields except the autoseq/autogen ID field! If there is a separate unique id such as social security number, use that. If there isn't, essentially it boils down to an equals method that compares all fields, except ID. Because that's the one field that definitely has zero bearing on what defines a person.
An instance defines a state in your application
This is almost a cop-out, and in general means equality is irrelevant / not applicable. It's like asking how to implement an equals method to an InputStream implementation - mostly, you.. don't.
Here, the default behaviour (Object's own impls) are what you want, and therefore, you don't implement either hashCode or equals. Any instance of Person is equal to itself (as in, a.equals(a), same reference), and not equal to any other, even if the other has identical values for each and every field, even the id field isn't null (represents the same row).
Such an object cannot meaningfully be used as a value object. For example, it would be pointless to stuff such things in a hashmap (at best, you can stuff them in an IdentityHashMap, as those semantics would apply. Only way to do any lookups is to have a ref that was .put() into it before and call .get() with that).
Which one is right? Up to you. But document it clearly, because in my experience, lots of hibernate users are absolutely convinced either the first or second model is the one, and only, right answer, and consider the other answer utterly bonkers. This is problematic - they'd be writing their code assuming all hibernate model classes work precisely as they want, and would therefore not even be thinking of checking docs/impl to know how it actually works.
For what its worth, objects are objects and database rows do not neatly map to the notion of an object. SQL's and java's notion of null are utterly incompatible, and the notion of 'a query' does not neatly map to tables (between selecting expressions, selecting on views, and JOINs, that should be obvious) - hibernate is tilting at windmills. It is a leaky abstraction and this is one of its many, many leaks. Leaky abstractions can be useful, just, be aware that at the 'edges' the principle hibernate tries to peddle you (that objects can represent query results and rows) has limits you will run into. A lot.

Related

What to do for objects that are equal but not identical?

This is either a Java coding design question, or a domain modelling question, I'm not sure yet!
Background, simplified as I can't share all the detail:
The system I'm working on is deployed in multiple instances (e.g. at different customers, and at one customer there may be development, test, preprod, prod instances)
The system configuration is a list of ConfigArtefact<T>, where T indicates that it might be a database connection configuration, or a predefined-query, or....
ConfigArtefacts are named. The names are semantically meaningful/well-known (e.g there could be an object for "Console.translations.en", "Console.translations.fr") or "Application.Database.connection.credentials", "Reporting.Database.connection.credentials") and are distinct for each deployment- no two different ConfigArtefacts will have the same name for a given deployment
ConfigArtefacts have other attributes (e.g. for the database, username and password) depending on the type used for <T>. The value of the attributes could be different in different deployments of this system.
There's no natural ordering of artefacts, even ones of the same type <T>. Where some arbitrary ordering is needed, I use the name.
Goal:
I need to write something that compares the configuration of two deployments of this system and identify Artifacts that have been added, removed, or changed. In order to find the same artefact on each deployment, I need to compare by name only (I always know what type of artefact I'm working with). In order to say if they've changed, I need to compare by all other attributes.
So, two kinds of comparison. One can be modelled with equals/hashcode, but not the other. Which should use equals()? (I think the one by name, as then added and deleted are just set subtraction, using one of the many collection libraries).
Would that be the normal choice? And if so, is there a conventional name for the other ("full compare") one? I'm considering identicalTo() (so two objects are changed if one.identicalTo(two) is false)

Your design is flawed - you have misused equals() by only comparing name.
If all attributes must be compared to know if the object has changed, then it is not true that objects with the same name are “equal”, because the use of the word “changed” implies there’s a difference, and if there’s a difference they’re not equal.
Finding something by using an identifier is different to two objects being equal if they have the same identifier.
Implement equals() and hashCode() using all attributes that matter for determining if an object is “different”.
To facilitate convenience and performance, populate a Map<String, ConfigArtefact<?>> for each environment using the name as the key.
Finding differences between 2 such maps is a fairly trivial O(n) task.

Uni/Bi direction X One/Many X Many/One Association relationships

Following question also refers to discussion in following questions as well
https://stackoverflow.com/search?page=2&tab=Relevance&q=one%20to%20many%20unidirectional%20java
Best practise for adding a bidirectional relation in OO model
I tried to implementing 8 association combinations formed by [Unidirectional/Bidirectional] X [(One/Many) to (One/Many)] in Java. I found two cases can not be implemented namely Unidirectional One to One and Unidirectional One to Many (e.g. Person->*Vehicle). Other 6 combinations and Composition are possible programatically.
I Feel its not only the case with Java, these 2 cases do not exist. e.g. Use case - allocate one Aadhar/SSN number to only one person is possible if we know that number is not allocated to anybody else (reverse navigation is must). Does this mean we need to take care while making our design model not to arrive at these specific associations (though they might be present in analysis model)? I am confused on this.

Basic (No Aggregation)
If you are looking at basic unidirectional association, then that's the simplest of them all.
Unidirectional One to One
class Person {
String name;
}
Unidirectional One to Many
class Person {
List vehicles;
}
Composite Aggregation
If I assume that you are asking about composite relationshions (where one SSN can be assigned to at most one person), then you can still implement it.
How exactly you decide to implement it is however subject to your specific domain or e.g. how you store your data, because
reverse navigation is must
is not actually true, because you can just check all Person instances; or you can store all the SSNs in a smart data structure that allows you to quickly check if a new one is unique, and then you would assign it to the Person without additional checks, because you already know that it is unique).
Or you can implement also the opposite lookup, which is not prohibited even if the association is "uni-directional"
To quote the UML Specs (11.5.3.1 Associations) [emphasis mine]:
Navigability means that instances participating in links at runtime (instances of an Association) can be
accessed efficiently from instances at the other ends of the Association. The precise mechanism by which such efficient
access is achieved is implementation specific. If an end is not navigable, access from the other ends may or may not be
possible, and if it is, it might not be efficient.
Update from comments
Noone claims that upholding the relationship constraints has to be done in the accessors. In fact pretty much always you will have temporarily invalid relationships, imagine:
person = new Person();
// right now person is invalid state because it doesn't have an SSN
ssn = ssnGenerator.createNew();
// now ssn is also in invalid state because it has no person
person.setSSN(ssn);
// only now is person and ssn valid
(creating a constructor wouldn't help, because constructor is called after the object has already been created (so another part of the constructor could need the ssn already set).
So it is the responsibility of the programmer to ensure that the system upholds all constraints in whatever way it makes most sense. Using constructors/accessors is the easiest way in some circumstances, but you could e.g. wrap the code above in an atomic transaction. After all, if you kept your validation in the setSSN(), then what would happen if the programmer were to forget to call the method at all?
(person 1->* vehicle)
p1.add(v1) and p2.add(v1) are possible violations
You asked about "person ->* vehicle", now you've changed it to "person 1 -> * vehicle" so obviously the answer differs. But the same principle as above applies -- it is the responsibility of the system to uphold all constraints, and wherever that's done in accessors, validation methods, or the way the system constructed is an implementational detail -- there's no single best way, and there will be always trade-offs.

Should I compare all fields in my class's "equals" method?

I'm working on an application that allows the user to manage accounts. So, suppose I have an Account class, representing one of the user's accounts:
class Account
{
public int id;
public String accountName;
public String accountIdentifier;
public String server;
public String notes;
}
My equals method looks like this:
public boolean equals(Object o)
{
if (this == o)
return true;
if (o == null || !(o instanceof Account))
return false;
Account other = (Account) o;
if (!accountIdentifier.equals(other.accountIdentifier))
return false;
if (!server.equals(other.server))
return false;
return true;
}
As you can see, I'm only comparing the accountIdentifier and the server, but not the other fields. There are several reasons why I chose this approach.
I keep the accounts in a List. When the user updates an account, by changing the account name (which is just a name specified by the user to identify the account) or the notes, I can do accountList.set(accountList.indexOf(account), account); to update the account in the list. If equals compared all properties, this approach wouldn't work, and I'd have to work around it (for example by iterating over the list and checking for these properties manually).
This might actually be more important, but it only came to my mind after thinking about it for a while. An Account is uniquely identified by the accountIdentifier and the server it belongs to. The user might decide to rename the account, or change the notes, but it's still the same account. But if the server is changed, I think I would consider it a different account. The id is just an internal ID since the accounts are stored in a database. Even if that changed, the account is still considered the same account if the accountIdentifier and the server stayed the same.
What I'm trying to say is that I basically implemented equals this way to allow for shorter, more concise code in the rest of the application. But I'm not sure if I'm breaking some rules here, or if I'm doing something that might cause other developers headaches if it ever happens that someone is working with my application's API.
Is it okay to only compare some fields in the equals method, or should I compare all fields?

Yes, it's definitely okay to do this. You get to decide what equality means for your class, and you should use it in a way that makes the most sense for your application's logic — in particular, for collections and other such classes that make use of equality. It sounds like you have thought about that and decided that the (server, identifier) pair is what uniquely distinguishes instances.
This would mean, for instance, that two instances with the same (server, identifier) pair but a different accountName are different versions of the same Account, and that the difference might need to be resolved somehow; that's a perfectly reasonable semantic.
It may make sense to define a separate boolean allFieldsEqual(Account other) method to cover the "extended" definition, depending on whether you need it (or would find it useful for testing).
And, of course, you should override hashCode to make it consistent with whatever definition of equals you go with.

You should compare all of the fields that are necessary to determine equality. If the accountIdentifier and server fields are enough to determine if two objects are equal, then that is perfectly fine. No need to include any of the other fields that don't matter in terms of equality.

For the key normally you should use the business key, this key can be simple or composite key and not necessary need to include all the fields in the entity. So... depends of each case to select what identify an entity. If possible should be the minimum number of field fully and unique identify the entity.
Some people prefer (and is a good practice) to create a surrogate key that will identity the object, this is very useful when you want to persist your objects using any ORM due you don’t need to export the keys to the child entities in 1:M or M:N relations. For example the ID in your sample can be considered as surrogate key if you create it as internal unique identifier.
Also may want to take into consideration:
Always you override equals you must override hashCode too, this is important to work properly with classes like Collections, Maps etc
Apache provide a really nice API to help in the implementation of equals and hashCode. Those classes are EqualsBuilder and HashCodeBuilder. Both allow you to concatenate the fields you want to use in your comparison and have a way also to use reflection.

The answer is "it depends depends on the semantics of your data".
For example, you might internally store a field that can be derived (calculated) from the other fields. In which case, you don't need to compare the calculated value.
As a gross generalisation, anything that cannot be derived from other fields should be included.

This is fine - and probably a good thing to do. If you've identified equality as the accountIdentifier and the server being distinct and unique, then that's perfectly valid for your use case.
You don't want to use more fields than you need to since that would produce false positives in your code. This approach is perfectly suitable to your needs.

Is it okay to implement equals() and hashCode() on #Entity Model classes in Play Framework?

I have an #Entity-annotated Model class in my Play application, which I am stuffing into a Set to ensure uniqueness prior to saving to the database. Since the objects haven't been saved yet, they don't have a key value. So the comparison must be based on their properties.
I took a peek at the equals() implementation in JPABase, and it seems to explicitly make all entity objects unequal if they don't have a key yet.
So, my question(s):
Is it okay to implement equals() and hashCode() in my Model subclasses?
If so, what is the best/correct way to do so? Should I call super.equals(), and how should I deal with the result?

I don't recommend to implement equals or hashCode in application model class. The uniqueness should be ensured by database in the end. From the application's perspective, uniqueness is almost always imposed on a certain field, e.g. username or email etc. I've never met the case that data entity uniqueness is computed on the entire object via hashCode or equals method.

Should the id field of a JPA entity be considered in equals and hashCode?

I hit a problem when writing tests for a database application using JPA2 and EclipseLink:
I add some entity to a database, retrieve it later and want to compare it to an instance which has the values I expect to confirm that the addition worked as I intended.
First I wrote something like
assertEquals(expResult, dbResult);
which failed, because I can't really know the value of id field, which is generated by the database and therefore dbResult differs from expResult which I created with new and populated manually.
I see two options:
Either I remove the id field from equals and hashCode so that the comparison is only based on the "real values". I don't know if this causes problems in the database or elsewhere, though.
Or I write my tests to explicitly check every field except id manually.
What should I do?

You might find a lot of controversy about this one. My stance is that you absolutely don't use a database primary key for anything in your application. It should be completely invisible. Identify your objects in your application by some other property or combination of properties.
On the "testing persistence operations" front, what you really want is probably to check that the fields were saved and loaded correctly and maybe that the primary key got assigned some value when you saved it. This probably isn't a job for the equals method at all.

Relying on database generated Ids in your equals and hashCode implementation is not advisable. You ought to rely on the truly unique/semi-unique attributes of your classes in checking for equality, and in generating the hashcode values. The Hibernate documentation has an extensive page that discusses this, and the facts therein are applicable to more or less every JPA provider.
The underlying reason for using business keys over database generated values in your equals and hashCode implementation is that the JPA provider must actually issue a SELECT after persisting the entity in the database. If you compare objects using the database generated Ids, then you will end up having an equality test that fails in the following scenarios:
If E1 and E2 are entities of class E (that verifies equality using database generated Ids), then if E1 and E2 will be equal if they haven't been stored in the database yet. This is not what you want, especially if want to store E1 and E2 in some Set before persistence. This is worse if the attributes of E1 and E2 possess different values; the equals implementation would prevent two significantly different entities from being added to a Set, and the hashCode implementation will give you a O(n) lookup time when entities are looked up from a HashMap using the primary key.
If E1 is a managed entity that has been persisted, and E2 is an entity that has not been persisted, then the equality test would deem that E1 != E2 in the scenario where all the attribute values of E1 and E2 (except for the Ids) are similar. Again, this is probably not what you want, especially if you want to avoid duplicate entities in the database that differ only in their database generated Ids.
The equals and hashCode implementations therefore ought to use business keys, in order to exhibit consistent behavior for both persisted and unpersisted entities.

From the book Hibernate in Action, its recommended to defined a business key and test equality on that. A business key is "a property, or some combination of properties, that is unique for each instance with the same database identity." In other areas it says to not use the id as one of those properties, and don't use values in collections.

I would write my test to explicitly check for fields. To make this easy, before performing the assertEqual test, I will set the id of both the expected and actual result to the same predefined value and then use the normal equals method.
Removing ID from equals is not justifiable, just because testing is slightly difficult. You are foregoing serious performance benefits and also code integrity.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.