Object equality in context of hibernate / webapp - java

How do you handle object equality for java objects managed by hibernate? In the 'hibernate in action' book they say that one should favor business keys over surrogate keys.
Most of the time, i do not have a business key. Think of addresses mapped to a person. The addresses are keeped in a Set and displayed in a Wicket RefreshingView (with a ReuseIfEquals strategy).
I could either use the surrogate id or use all fields in the equals() and hashCode() functions.
The problem is that those fields change during the lifetime ob the object. Either because the user entered some data or the id changes due to JPA merge() being called inside the OSIV (Open Session in View) filter.
My understanding of the equals() and hashCode() contract is that those should not change during the lifetime of an object.
What i have tried so far:
equals() based on hashCode() which uses the database id (or super.hashCode() if id is null). Problem: new addresses start with an null id but get an id when attached to a person and this person gets merged() (re-attached) in the osiv-filter.
lazy compute the hashcode when hashCode() is first called and make that hashcode #Transitional. Does not work, as merge() returns a new object and the hashcode does not get copied over.
What i would need is an ID that gets assigned during object creation I think. What would be my options here? I don't want to introduce some additional persistent property. Is there a way to explicitly tell JPA to assign an ID to an object?
Regards

Using the id of an entity is not a good idea because transient entities don't have an id yet (and you still want a transient entity to be potentially equal to a persistent one).
Using all properties (apart from the database identifier) is also not a good idea because all properties are just not part of the identity.
So, the preferred (and correct) way to implement equality is to use a business key, as explained in Java Persistence with Hibernate:
Implementing equality with a business key
To get to the solution that we recommend, you need to understand the notion of
a business key. A business key is a property, or some combination of properties, that
is unique for each instance with the same database identity. Essentially, it’s the natural key that you would use if you weren’t using a surrogate primary key instead.
Unlike a natural primary key, it isn’t an absolute requirement that the business
key never changes—as long as it changes rarely, that’s enough.
We argue that essentially every entity class should have some business key, even
if it includes all properties of the class (this would be appropriate for some
immutable classes). The business key is what the user thinks of as uniquely identifying a particular record, whereas the surrogate key is what the application and
database use.
Business key equality means that the equals() method compares only the properties that form the business key. This is a perfect solution that avoids all the problems described earlier. The only downside is that it requires extra thought to
identify the correct business key in the first place. This effort is required anyway;
it’s important to identify any unique keys if your database must ensure data integrity via constraint checking.
For the User class, username is a great candidate business key. It’s never null,
it’s unique with a database constraint, and it changes rarely, if ever:
public class User {
...
public boolean equals(Object other) {
if (this==other) return true;
if ( !(other instanceof User) ) return false;
final User that = (User) other;
return this.username.equals( that.getUsername() );
}
public int hashCode() {
return username.hashCode();
}
}
Maybe I missed something but for an Address, the business key would typically be made of the street number, the street, the city, the postal code, the country. I don't see any problem with that.
Just in case, Equals And HashCode is another interesting reading.

Maybe a transient property would do it?
That way you don't have to worry about the persistence. Like this:
#Transient
private Integer otherId;

I use to do it that way: equal and hashcode use the key when it has been set, otherwise equals uses the base implementation (aka ==). It should work too if hashcode() returns super.hashcode() instead of 0.
#Override
public int hashCode() {
if (code == null) {
return 0;
} else {
return code.hashCode();
}
}
#Override
public boolean equals(Object obj) {
if (obj instanceof PersistentObject && Hibernate.getClass(obj).equals(Hibernate.getClass(this))) {
PersistentObject po = (PersistentObject) obj;
if (code == null) {
return po.code == null && this == po;
} else {
return code.equals(po.getCode());
}
} else {
return super.equals(obj);
}
}

The question is how often are you likely to have multiple unsaved objects that might be duplicates that need to go into a set or map? For me, the answer is virtually never so I use surrogate keys and super.equals/hashcode for unsaved objects.
Business keys make sense in some cases, but they can cause problems. For example, what if two people live at the same address - if you want that to be one record in the database, then you have to manage it as a many-to-many and lose the ability to cascade delete it so when the last person living there is deleted, you have to do extra work to get rid of the address. But if you store the same addresss for each person then your business key has to include the person entity, which may mean a database hit inside your equals/hashcode methods.

Thanks for all your input. I decided to use surrogate keys and provide those right at object creation time. This way i stay clear of all that 'rarely' changing stuff and have something solid to base identity on. First tests look rather good.
thank you all for your time. Unfortunately, i can only accept one answer as solution i will take Pascals, as he provided me with good reading ;)
enjoy

Related

equals and hashCode with many fields in Java? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
In Java apps, I prefer to use unique fields in equals and hashCode methods instead of adding only id field or all the fields. However, I am confused about the following points:
By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except from id field)?
Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only add some numeric field instead of adding text fields?
JPA and Hibernate don't specify or rely on any particular semantics for entities' equals() and hashCode() methods, so you can do what you want.
Good alternatives
With that said, there is a handful of alternatives for equality that make much more sense to me than any others:
Equality corresponds to object identity. This is of course the default provided by Object.equals(), and it can serve perfectly well for entities. OR
Equality corresponds to persistent identity. That is, entities are equal if and only if they have the same entity type and primary key. OR
Equality corresponds to (only) value equality. That is, equality of all corresponding persistent fields except the ID. There are additional variations around how that applies to mapped relationships. OR
Equality corresponds to persistent identity AND value equality. Again, there are variations around how the value equality part applies to mapped relationships.
General advice
In general, you would do well to follow a fule rules of thumb:
As with most other classes, especially mutable ones, default to just inheriting Object.equals() and Object.hashCode(). Have a specific purpose and plan before you do otherwise, and remember that you get only one choice for this. And that it is impactful.
If you do override equals() (and therefore hashCode() as well) then do it in a consistent way across all your entities.
Think carefully before you go with an option involving value equality. This is usually a poor choice for mutable classes in general, and entities are no exception.
Specific Questions
1. By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
I think using the ID is fine. It's simply a question of what you want equality to represent for your entities. You absolutely can have distinct entity objects with the same type and ID, and you might want to be able to detect that with equals(). The other persistent fields might or might not factor into that.
In particular, an equals() method based solely on entity ID might make sense for entities that appear on the "many" side of a one-to-many relationship when that is mapped to a Set.
2. When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except
from id field)?
I see no good reason to consider only a proper subset of unique fields, except that subset consisting only of the entity ID. Or if all the fields are unique then the one consisting of all the fields except the ID. The logic that suggests that you might be able to consider other proper subsets revolves around the persistent identity of the entity, which is completely and best represented by its ID.
3. Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only
add some numeric field instead of adding text fields?
If your sense of equality is to be based on entity value then I don't see how it makes much sense to omit any persistent fields except, possibly, the ID. Do not arbitrarily omit the ID -- it may very well be something you want to include. Again, it depends on what equals() is intended to mean for your entities.
That's a tricky question that hibernate itself doesn't have a clear answer on.
John Bollinger's answer covers your specific question, but there is some additional context about how to think about equality and hibernate that should help figure out what to do. After all, given that hibernate doesn't require you to do anything particular, you can do whatever you want, which leads to the obvious question: ... okay, so what should I do, then?
That question boils down to (using Person as an arbitrary example of a model class + associated table; furthermore, lets say the person table has a single unique ID that is generated (A random UUID or auto-sequenced integer value).
What does an instance of Person represent?
There are in broad strokes 2 answers:
It represents a person. A row in the person table also represents a person; these 2 things aren't related.
It represents a row in the person table.
It represents a state in my application, nothing more.
Even though these things sound quite similar, they result in opposite meanings as to equality.
Which choice is correct? That's up to you.
When reading on, remember:
Any Person instance which isn't "saved" yet, would have a null value for id, because upon insertion, hibernate will ask the DB to generate a value for it or generates one itself and only then fills it in.
An instance represents a row
Equality under the second model (an instance of Person represents a row in the table) should look only at the id column, because that defines row uniqueness; any 2 representations of a row in the person table are guaranteed to be referring to the same row (hence, equal) if and only if the id is equal. That is a necessary and sufficient condition: If they are equal the 2 objects are necessarily referring to the same row, and if they aren't equal, then they are necessarily referring to different rows.
Notably, if id is still null, then they cannot be equal, not even to themselves: More generally the question: "Is this object-representing-a-row equal to this other object-representing-a-row" is a meaningless question if these objects are representing rows-to-be (unsaved rows). If you invoke save() on each object, you end up with 2 rows. Optimally such an object should be considered in a state such that attempting to invoke equals on it is a failure, but the spec of equals states that they can't throw, therefore, false is the best answer. This would mean you want:
class Person {
// fields
#Override public boolean equals(Object other) {
if (other == this) return true;
if (other == null || other.getClass() != Person.class) return false;
UUID otherId = ((Person) other).id;
return id == null ? false : id.equals(otherId);
}
}
This defines your equals method as 'ends up representing the same row'. This holds even if you change meaningful state:
Change the name and save the object? It's... still the same row, and this equality implementation reflects this.
Call save() on each in the comparison when they were unsaved? Then you get 2 rows - and this equality implementation reflects this before and after attempting to save it.
If invoking on self (a.equals(a)) this returns true as the equality spec demands; it also works out in the 'modelling a row' view: If you invoke save() on the same object twice, it's still just one row.
An instance represents a person
The nature of what a person is is entirely unrelated to the autosequence/autogen ID it gets; the fact that we're using hibernate is an implementation detail that should play no part at all in considering equality; after all, this object represents the notion of a person, and that notion exists entirely independent of the database. The database is one thing that is modelling persons; instances of this class are another.
In this model you should do the exact opposite: Find something that uniquely identifies a person itself, and compare against that. After all, if you have 2 rows in a database that both contain the same social security number, then you have only 1 person.. and you just happen to have 2 rows that are both referring to the same person. Given that we chose our instance to imply that it represents a person, then an instance loaded from row A, and an instance loaded from row B, ought to be considered as equal - after all, they are representing the same individual.
In this case, you write an equals method that considers all relevant fields except the autoseq/autogen ID field! If there is a separate unique id such as social security number, use that. If there isn't, essentially it boils down to an equals method that compares all fields, except ID. Because that's the one field that definitely has zero bearing on what defines a person.
An instance defines a state in your application
This is almost a cop-out, and in general means equality is irrelevant / not applicable. It's like asking how to implement an equals method to an InputStream implementation - mostly, you.. don't.
Here, the default behaviour (Object's own impls) are what you want, and therefore, you don't implement either hashCode or equals. Any instance of Person is equal to itself (as in, a.equals(a), same reference), and not equal to any other, even if the other has identical values for each and every field, even the id field isn't null (represents the same row).
Such an object cannot meaningfully be used as a value object. For example, it would be pointless to stuff such things in a hashmap (at best, you can stuff them in an IdentityHashMap, as those semantics would apply. Only way to do any lookups is to have a ref that was .put() into it before and call .get() with that).
Which one is right? Up to you. But document it clearly, because in my experience, lots of hibernate users are absolutely convinced either the first or second model is the one, and only, right answer, and consider the other answer utterly bonkers. This is problematic - they'd be writing their code assuming all hibernate model classes work precisely as they want, and would therefore not even be thinking of checking docs/impl to know how it actually works.
For what its worth, objects are objects and database rows do not neatly map to the notion of an object. SQL's and java's notion of null are utterly incompatible, and the notion of 'a query' does not neatly map to tables (between selecting expressions, selecting on views, and JOINs, that should be obvious) - hibernate is tilting at windmills. It is a leaky abstraction and this is one of its many, many leaks. Leaky abstractions can be useful, just, be aware that at the 'edges' the principle hibernate tries to peddle you (that objects can represent query results and rows) has limits you will run into. A lot.

What is a significant field for the equals contract (effective java item 8)

In Effective Java in Item 8 the recommendation is that
For each significant field in the class to check the corresponding field of the this object.
I understand that we can have secondary fields that are calculated by primary fields but what exactly is the meaning of "for each significant field"? Is the equals contract implemented properly only when all fields of an object is compared?
If I have e.g. a class Employee which has a multitude of fields like id, first and last name, dob, position, location etc all these seem significant but to me it seems that just using the id would suffice for a proper and performant equals implementation.
Am I wrong on this? Or the id I mention is exactly what Bloch means by "significant" fields?
class Employee {
private UUID id;
private String firstName;
private String lastName;
#Override
public boolean equals(Object obj) {
if (this == obj)
return true;
if (!(obj instanceof Employee))
return false;
return id.equals(((Employee)obj).id);
}
#Override
public int hashCode() {
return Objects.hash(id);
}
}
In case of Employee is stored into DB, i.e. has unique id, then no need to check other fields like firstName and lastName in equals; according to data object comparison, only id filed is significant.
A significant field is merely one that, if omitted, would result in an incorrect implementation of equals (according to the notion of equality you have defined for instances of your class).
I appreciate that is a bit of a self-referential definition, but that's what it means.
The canonical example of a non-significant field is String.hashCode: as you observe, this is calculated from other fields (and lazily), so it would not be appropriate to include in the equality because there is no guarantee that it has been calculated for either of the strings being compared; and, if it has been calculated for both, it tells you nothing more than you already know.
In your case, yes, it sounds like comparing instances using only the id would suffice: this is a significant field, the name (etc) is not significant: there should only be one person (little p, as in an actual real human) with a particular id.
It does raise a question of how you would deal with "same id, different name" instances, but this is getting into the realm of Falsehoods Programmers Believe About Names:
People have exactly one canonical full name.
People have exactly one full name which they go by.
People have, at this point in time, exactly one canonical full name.
People have, at this point in time, one full name which they go by.
People have exactly N names, for any value of N.
(People’s names fit within a certain defined amount of space.)
People’s names do not change.
...
Taking these into consideration, if you want to say "this Person is the same as that Person" (and you want to do that using equals), id seems like the only reasonable thing to use.

Should I compare all fields in my class's "equals" method?

I'm working on an application that allows the user to manage accounts. So, suppose I have an Account class, representing one of the user's accounts:
class Account
{
public int id;
public String accountName;
public String accountIdentifier;
public String server;
public String notes;
}
My equals method looks like this:
public boolean equals(Object o)
{
if (this == o)
return true;
if (o == null || !(o instanceof Account))
return false;
Account other = (Account) o;
if (!accountIdentifier.equals(other.accountIdentifier))
return false;
if (!server.equals(other.server))
return false;
return true;
}
As you can see, I'm only comparing the accountIdentifier and the server, but not the other fields. There are several reasons why I chose this approach.
I keep the accounts in a List. When the user updates an account, by changing the account name (which is just a name specified by the user to identify the account) or the notes, I can do accountList.set(accountList.indexOf(account), account); to update the account in the list. If equals compared all properties, this approach wouldn't work, and I'd have to work around it (for example by iterating over the list and checking for these properties manually).
This might actually be more important, but it only came to my mind after thinking about it for a while. An Account is uniquely identified by the accountIdentifier and the server it belongs to. The user might decide to rename the account, or change the notes, but it's still the same account. But if the server is changed, I think I would consider it a different account. The id is just an internal ID since the accounts are stored in a database. Even if that changed, the account is still considered the same account if the accountIdentifier and the server stayed the same.
What I'm trying to say is that I basically implemented equals this way to allow for shorter, more concise code in the rest of the application. But I'm not sure if I'm breaking some rules here, or if I'm doing something that might cause other developers headaches if it ever happens that someone is working with my application's API.
Is it okay to only compare some fields in the equals method, or should I compare all fields?
Yes, it's definitely okay to do this. You get to decide what equality means for your class, and you should use it in a way that makes the most sense for your application's logic — in particular, for collections and other such classes that make use of equality. It sounds like you have thought about that and decided that the (server, identifier) pair is what uniquely distinguishes instances.
This would mean, for instance, that two instances with the same (server, identifier) pair but a different accountName are different versions of the same Account, and that the difference might need to be resolved somehow; that's a perfectly reasonable semantic.
It may make sense to define a separate boolean allFieldsEqual(Account other) method to cover the "extended" definition, depending on whether you need it (or would find it useful for testing).
And, of course, you should override hashCode to make it consistent with whatever definition of equals you go with.
You should compare all of the fields that are necessary to determine equality. If the accountIdentifier and server fields are enough to determine if two objects are equal, then that is perfectly fine. No need to include any of the other fields that don't matter in terms of equality.
For the key normally you should use the business key, this key can be simple or composite key and not necessary need to include all the fields in the entity. So... depends of each case to select what identify an entity. If possible should be the minimum number of field fully and unique identify the entity.
Some people prefer (and is a good practice) to create a surrogate key that will identity the object, this is very useful when you want to persist your objects using any ORM due you don’t need to export the keys to the child entities in 1:M or M:N relations. For example the ID in your sample can be considered as surrogate key if you create it as internal unique identifier.
Also may want to take into consideration:
Always you override equals you must override hashCode too, this is important to work properly with classes like Collections, Maps etc
Apache provide a really nice API to help in the implementation of equals and hashCode. Those classes are EqualsBuilder and HashCodeBuilder. Both allow you to concatenate the fields you want to use in your comparison and have a way also to use reflection.
The answer is "it depends depends on the semantics of your data".
For example, you might internally store a field that can be derived (calculated) from the other fields. In which case, you don't need to compare the calculated value.
As a gross generalisation, anything that cannot be derived from other fields should be included.
This is fine - and probably a good thing to do. If you've identified equality as the accountIdentifier and the server being distinct and unique, then that's perfectly valid for your use case.
You don't want to use more fields than you need to since that would produce false positives in your code. This approach is perfectly suitable to your needs.

Should #Transient property be used in equals/hashCode/toString?

I have JPA entities where some properties are annotated with #Transient.
Should I use these properties in equals/hashCode/toString methods?
My first thought is NO but I don't know why.
Tips?
Ideas?
Explanations?
The case of toString() is different, you can do whatever you want with toString() so I will only cover equals() (and hashCode()).
First, the rule: if you want to store an object in a List, Map or a Set then it is a requirement that equals and hashCode are implemented so they obey the standard contract as specified in the documentation.
Now, how to implement equals() and hashCode()? A "natural" idea would be to use the properties mapped as Id as part of the equals():
public class User {
...
public boolean equals(Object other) {
if (this==other) return true;
if (id==null) return false;
if ( !(other instanceof User) ) return false;
final User that = (User) other;
return this.id.equals( that.getId() );
}
public int hashCode() {
return id==null ? System.identityHashCode(this) : id.hashCode();
}
}
Unfortunately, this solution has a major problem: when using generated identifiers, the values are not assigned until an entity becomes persistent so if a transient entity is added to a Set before being saved, its hash code will change while it's in the Set and this breaks the contract of the Set.
The recommended approach is thus to use the attributes that are part of the business key i.e. a combination of attributes that is unique for each instance with the same database identity. For example, for the User class, this could be the username:
public class User {
...
public boolean equals(Object other) {
if (this==other) return true;
if ( !(other instanceof User) ) return false;
final User that = (User) other;
return this.username.equals( that.getUsername() );
}
public int hashCode() {
return username.hashCode();
}
}
The Hibernate Reference Documentation summarizes this as follow:
"Never use the database identifier to implement equality; use a business key, a combination of unique, usually immutable, attributes. The database identifier will change if a transient object is made persistent. If the transient instance (usually together with detached instances) is held in a Set, changing the hashcode breaks the contract of the Set. Attributes for business keys don't have to be as stable as database primary keys, you only have to guarantee stability as long as the objects are in the same Set." - 12.1.3. Considering object identity
"It is recommended that you implement equals() and hashCode() using Business key equality. Business key equality means that the equals() method compares only the properties that form the business key. It is a key that would identify our instance in the real world (a natural candidate key)" - 4.3. Implementing equals() and hashCode()
So, back to the initial question:
Use a business key if possible. #Transient attributes are very likely not part of such a key.
If not possible, use identifier properties but make sure to get the values assigned before to add an entity to a List, Map, Set.
See also
Equals and HashCode
Don't Let Hibernate Steal Your Identity
equals and hashcode in Hibernate
Understanding equals() and hashCode() (p. 396) in Java Persistence with Hibernate
The two typical usages of #Transient and transient that I'm aware of, are to use them either for stuff that can't be serialized/persisted (e.g. a remote resource handle) or computed properties which can be reconstructed from others.
For computed data, it makes no sense to use them in the equality relationship (equals/hashCode), because it would be redundant. The value is computed out of other value which are already used in the equality. It can however still makes sense to print them in toString (e.g. a base price and a ratio are used to compute the actual price).
For not serializable/persitable data, it depends. I can imagine a handle to a resource that is not serializable, but you can still compare the resource name that the handle represent. Same for toString, maybe printing the handle resource name is useful.
This was my 2 cent, but if you explain your particular usage of #Transient, someone can maybe give a better advice.
Exception maybe comes from letting it be transient and at the same time you provide writeObject() and readObject() where you process it.

Value object with OID

Can you describe the pros and cons of including an OID (typically a database row identifier) in a POJO representing an entity in your model?
In fact I'm not talking about issues related to equals/hashcode and so on, I should have described better my problem (my bad :) )...
We've got some of those entity classes which represent business objects (like Product, Catalog and so on...). Sometime they have a 'business id', for example Product can be found by its unique ProductId (which has 3 fields : id, type, repository).
In our database, the Product table has a surrogate primary key column (OID) in addition to the 3 business columns ( id, type, repository) to facilitate foreign keys references and to have less joins clauses.
The Product/ProductId classes are part of the API that we expose to other applications. So for example they can call :
productManager.findProductById(ProductId productId);
The question is, should or should not the OID be included in the Product or in the ProductId class knowing that our clients are expected to use the ProductId identifier.
Pros :
I can use the OID to do another lookup like
Product p = productManager.findProductById(ProductId productId);
Catalog c = productManager.findAllCatalogsContainingProduct(p.getOid());
We're used to lookup a lot in the application by ProductId so this saves each time a roundtrip to the database to avoid to find the OID matching a ProductId.
Cons :
I've just exposed the OID to a client (let's hope he doesn't use it instead of the business key!!)
Can you list other pros and cons?
Database row identifier = Primary key? If so, there is no pro or con, you have to have it otherwise you can't relate the POJO back to its corresponding database row.
To retrieve Products and Catalogs, the standard SQL way is to do a Join. For example, with my DAL I can do:
SearchCriteria sc = new SearchCriteria();
sc.AddBinding("ProductId", productId);
List<Entity> Products = SQL.Read(sc, new Product(new Catalog());
or
List<Entity> Products = SQL.Read(sc, new Catalog(new Product());
This way there is no need to reveal anything to the caller, nor for a roundtrip.
You can run into problems if your implementation of equals() or hashCode() is based off the identifier since it will likely be null initially and then change later once the object is persisted. See below:
http://java.sun.com/javase/6/docs/api/java/util/Set.html
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Let's assume that your implementation of hashCode() is based off the identifier and equals() uses hashCode() in its comparison. If you add the object to a Set and its identifer is null the equals comparisons will perform one way. If you then persist the object in the set, its identifier value will likely change, thus changing the behavior of equals() and hashCode(). This breaks the "contract" of Set as described above.
It's a bit of an edge case but one worth noting.

Categories

Resources