Comparing objects where almost all fields are same - java

In unit tests I usually use something like
assertThat(someObject).isEqualTo(someOtherObject);
to make sure these two objects have identical content. But what to do in case objects have identical content except certain fields?
Normally it would be like that:
assertThat(someObject.getFirstField()).isNotEqualTo(someOtherObject.getFirstField());
assertThat(someObject.getLastField()).isNotEqualTo(someOtherObject.getLastFieldd());
// and then gazilion fields that should be equal
assertThat(someObject.getNthField()).isEqualTo(someOtherObject.getNthField());
// ...
For obvious reasons, it is very unwieldy for bigger classes with a lot of fields. Is there way to specify that you expect certain fields to be not equal, but rest equal?
Both someObject and someOtherObject are same class that is Lomboked, so equals() is automatically handled.

If you are open to using assertJ then there is a very simple solution to your problem. Use
assertThat(someObject).isEqualToIgnoringGivenFields(someObjectOther, "firstField", "lastField")
Bear in mind that this uses reflection internally and failures can happen if you change the name of your fields inside the class which you want to ignore.
More on this Field by field comparisons

Related

equals and hashCode with many fields in Java? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
In Java apps, I prefer to use unique fields in equals and hashCode methods instead of adding only id field or all the fields. However, I am confused about the following points:
By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except from id field)?
Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only add some numeric field instead of adding text fields?
JPA and Hibernate don't specify or rely on any particular semantics for entities' equals() and hashCode() methods, so you can do what you want.
Good alternatives
With that said, there is a handful of alternatives for equality that make much more sense to me than any others:
Equality corresponds to object identity. This is of course the default provided by Object.equals(), and it can serve perfectly well for entities. OR
Equality corresponds to persistent identity. That is, entities are equal if and only if they have the same entity type and primary key. OR
Equality corresponds to (only) value equality. That is, equality of all corresponding persistent fields except the ID. There are additional variations around how that applies to mapped relationships. OR
Equality corresponds to persistent identity AND value equality. Again, there are variations around how the value equality part applies to mapped relationships.
General advice
In general, you would do well to follow a fule rules of thumb:
As with most other classes, especially mutable ones, default to just inheriting Object.equals() and Object.hashCode(). Have a specific purpose and plan before you do otherwise, and remember that you get only one choice for this. And that it is impactful.
If you do override equals() (and therefore hashCode() as well) then do it in a consistent way across all your entities.
Think carefully before you go with an option involving value equality. This is usually a poor choice for mutable classes in general, and entities are no exception.
Specific Questions
1. By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
I think using the ID is fine. It's simply a question of what you want equality to represent for your entities. You absolutely can have distinct entity objects with the same type and ID, and you might want to be able to detect that with equals(). The other persistent fields might or might not factor into that.
In particular, an equals() method based solely on entity ID might make sense for entities that appear on the "many" side of a one-to-many relationship when that is mapped to a Set.
2. When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except
from id field)?
I see no good reason to consider only a proper subset of unique fields, except that subset consisting only of the entity ID. Or if all the fields are unique then the one consisting of all the fields except the ID. The logic that suggests that you might be able to consider other proper subsets revolves around the persistent identity of the entity, which is completely and best represented by its ID.
3. Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only
add some numeric field instead of adding text fields?
If your sense of equality is to be based on entity value then I don't see how it makes much sense to omit any persistent fields except, possibly, the ID. Do not arbitrarily omit the ID -- it may very well be something you want to include. Again, it depends on what equals() is intended to mean for your entities.
That's a tricky question that hibernate itself doesn't have a clear answer on.
John Bollinger's answer covers your specific question, but there is some additional context about how to think about equality and hibernate that should help figure out what to do. After all, given that hibernate doesn't require you to do anything particular, you can do whatever you want, which leads to the obvious question: ... okay, so what should I do, then?
That question boils down to (using Person as an arbitrary example of a model class + associated table; furthermore, lets say the person table has a single unique ID that is generated (A random UUID or auto-sequenced integer value).
What does an instance of Person represent?
There are in broad strokes 2 answers:
It represents a person. A row in the person table also represents a person; these 2 things aren't related.
It represents a row in the person table.
It represents a state in my application, nothing more.
Even though these things sound quite similar, they result in opposite meanings as to equality.
Which choice is correct? That's up to you.
When reading on, remember:
Any Person instance which isn't "saved" yet, would have a null value for id, because upon insertion, hibernate will ask the DB to generate a value for it or generates one itself and only then fills it in.
An instance represents a row
Equality under the second model (an instance of Person represents a row in the table) should look only at the id column, because that defines row uniqueness; any 2 representations of a row in the person table are guaranteed to be referring to the same row (hence, equal) if and only if the id is equal. That is a necessary and sufficient condition: If they are equal the 2 objects are necessarily referring to the same row, and if they aren't equal, then they are necessarily referring to different rows.
Notably, if id is still null, then they cannot be equal, not even to themselves: More generally the question: "Is this object-representing-a-row equal to this other object-representing-a-row" is a meaningless question if these objects are representing rows-to-be (unsaved rows). If you invoke save() on each object, you end up with 2 rows. Optimally such an object should be considered in a state such that attempting to invoke equals on it is a failure, but the spec of equals states that they can't throw, therefore, false is the best answer. This would mean you want:
class Person {
// fields
#Override public boolean equals(Object other) {
if (other == this) return true;
if (other == null || other.getClass() != Person.class) return false;
UUID otherId = ((Person) other).id;
return id == null ? false : id.equals(otherId);
}
}
This defines your equals method as 'ends up representing the same row'. This holds even if you change meaningful state:
Change the name and save the object? It's... still the same row, and this equality implementation reflects this.
Call save() on each in the comparison when they were unsaved? Then you get 2 rows - and this equality implementation reflects this before and after attempting to save it.
If invoking on self (a.equals(a)) this returns true as the equality spec demands; it also works out in the 'modelling a row' view: If you invoke save() on the same object twice, it's still just one row.
An instance represents a person
The nature of what a person is is entirely unrelated to the autosequence/autogen ID it gets; the fact that we're using hibernate is an implementation detail that should play no part at all in considering equality; after all, this object represents the notion of a person, and that notion exists entirely independent of the database. The database is one thing that is modelling persons; instances of this class are another.
In this model you should do the exact opposite: Find something that uniquely identifies a person itself, and compare against that. After all, if you have 2 rows in a database that both contain the same social security number, then you have only 1 person.. and you just happen to have 2 rows that are both referring to the same person. Given that we chose our instance to imply that it represents a person, then an instance loaded from row A, and an instance loaded from row B, ought to be considered as equal - after all, they are representing the same individual.
In this case, you write an equals method that considers all relevant fields except the autoseq/autogen ID field! If there is a separate unique id such as social security number, use that. If there isn't, essentially it boils down to an equals method that compares all fields, except ID. Because that's the one field that definitely has zero bearing on what defines a person.
An instance defines a state in your application
This is almost a cop-out, and in general means equality is irrelevant / not applicable. It's like asking how to implement an equals method to an InputStream implementation - mostly, you.. don't.
Here, the default behaviour (Object's own impls) are what you want, and therefore, you don't implement either hashCode or equals. Any instance of Person is equal to itself (as in, a.equals(a), same reference), and not equal to any other, even if the other has identical values for each and every field, even the id field isn't null (represents the same row).
Such an object cannot meaningfully be used as a value object. For example, it would be pointless to stuff such things in a hashmap (at best, you can stuff them in an IdentityHashMap, as those semantics would apply. Only way to do any lookups is to have a ref that was .put() into it before and call .get() with that).
Which one is right? Up to you. But document it clearly, because in my experience, lots of hibernate users are absolutely convinced either the first or second model is the one, and only, right answer, and consider the other answer utterly bonkers. This is problematic - they'd be writing their code assuming all hibernate model classes work precisely as they want, and would therefore not even be thinking of checking docs/impl to know how it actually works.
For what its worth, objects are objects and database rows do not neatly map to the notion of an object. SQL's and java's notion of null are utterly incompatible, and the notion of 'a query' does not neatly map to tables (between selecting expressions, selecting on views, and JOINs, that should be obvious) - hibernate is tilting at windmills. It is a leaky abstraction and this is one of its many, many leaks. Leaky abstractions can be useful, just, be aware that at the 'edges' the principle hibernate tries to peddle you (that objects can represent query results and rows) has limits you will run into. A lot.

What to do for objects that are equal but not identical?

This is either a Java coding design question, or a domain modelling question, I'm not sure yet!
Background, simplified as I can't share all the detail:
The system I'm working on is deployed in multiple instances (e.g. at different customers, and at one customer there may be development, test, preprod, prod instances)
The system configuration is a list of ConfigArtefact<T>, where T indicates that it might be a database connection configuration, or a predefined-query, or....
ConfigArtefacts are named. The names are semantically meaningful/well-known (e.g there could be an object for "Console.translations.en", "Console.translations.fr") or "Application.Database.connection.credentials", "Reporting.Database.connection.credentials") and are distinct for each deployment- no two different ConfigArtefacts will have the same name for a given deployment
ConfigArtefacts have other attributes (e.g. for the database, username and password) depending on the type used for <T>. The value of the attributes could be different in different deployments of this system.
There's no natural ordering of artefacts, even ones of the same type <T>. Where some arbitrary ordering is needed, I use the name.
Goal:
I need to write something that compares the configuration of two deployments of this system and identify Artifacts that have been added, removed, or changed. In order to find the same artefact on each deployment, I need to compare by name only (I always know what type of artefact I'm working with). In order to say if they've changed, I need to compare by all other attributes.
So, two kinds of comparison. One can be modelled with equals/hashcode, but not the other. Which should use equals()? (I think the one by name, as then added and deleted are just set subtraction, using one of the many collection libraries).
Would that be the normal choice? And if so, is there a conventional name for the other ("full compare") one? I'm considering identicalTo() (so two objects are changed if one.identicalTo(two) is false)
Your design is flawed - you have misused equals() by only comparing name.
If all attributes must be compared to know if the object has changed, then it is not true that objects with the same name are “equal”, because the use of the word “changed” implies there’s a difference, and if there’s a difference they’re not equal.
Finding something by using an identifier is different to two objects being equal if they have the same identifier.
Implement equals() and hashCode() using all attributes that matter for determining if an object is “different”.
To facilitate convenience and performance, populate a Map<String, ConfigArtefact<?>> for each environment using the name as the key.
Finding differences between 2 such maps is a fairly trivial O(n) task.

Should I compare all fields in my class's "equals" method?

I'm working on an application that allows the user to manage accounts. So, suppose I have an Account class, representing one of the user's accounts:
class Account
{
public int id;
public String accountName;
public String accountIdentifier;
public String server;
public String notes;
}
My equals method looks like this:
public boolean equals(Object o)
{
if (this == o)
return true;
if (o == null || !(o instanceof Account))
return false;
Account other = (Account) o;
if (!accountIdentifier.equals(other.accountIdentifier))
return false;
if (!server.equals(other.server))
return false;
return true;
}
As you can see, I'm only comparing the accountIdentifier and the server, but not the other fields. There are several reasons why I chose this approach.
I keep the accounts in a List. When the user updates an account, by changing the account name (which is just a name specified by the user to identify the account) or the notes, I can do accountList.set(accountList.indexOf(account), account); to update the account in the list. If equals compared all properties, this approach wouldn't work, and I'd have to work around it (for example by iterating over the list and checking for these properties manually).
This might actually be more important, but it only came to my mind after thinking about it for a while. An Account is uniquely identified by the accountIdentifier and the server it belongs to. The user might decide to rename the account, or change the notes, but it's still the same account. But if the server is changed, I think I would consider it a different account. The id is just an internal ID since the accounts are stored in a database. Even if that changed, the account is still considered the same account if the accountIdentifier and the server stayed the same.
What I'm trying to say is that I basically implemented equals this way to allow for shorter, more concise code in the rest of the application. But I'm not sure if I'm breaking some rules here, or if I'm doing something that might cause other developers headaches if it ever happens that someone is working with my application's API.
Is it okay to only compare some fields in the equals method, or should I compare all fields?
Yes, it's definitely okay to do this. You get to decide what equality means for your class, and you should use it in a way that makes the most sense for your application's logic — in particular, for collections and other such classes that make use of equality. It sounds like you have thought about that and decided that the (server, identifier) pair is what uniquely distinguishes instances.
This would mean, for instance, that two instances with the same (server, identifier) pair but a different accountName are different versions of the same Account, and that the difference might need to be resolved somehow; that's a perfectly reasonable semantic.
It may make sense to define a separate boolean allFieldsEqual(Account other) method to cover the "extended" definition, depending on whether you need it (or would find it useful for testing).
And, of course, you should override hashCode to make it consistent with whatever definition of equals you go with.
You should compare all of the fields that are necessary to determine equality. If the accountIdentifier and server fields are enough to determine if two objects are equal, then that is perfectly fine. No need to include any of the other fields that don't matter in terms of equality.
For the key normally you should use the business key, this key can be simple or composite key and not necessary need to include all the fields in the entity. So... depends of each case to select what identify an entity. If possible should be the minimum number of field fully and unique identify the entity.
Some people prefer (and is a good practice) to create a surrogate key that will identity the object, this is very useful when you want to persist your objects using any ORM due you don’t need to export the keys to the child entities in 1:M or M:N relations. For example the ID in your sample can be considered as surrogate key if you create it as internal unique identifier.
Also may want to take into consideration:
Always you override equals you must override hashCode too, this is important to work properly with classes like Collections, Maps etc
Apache provide a really nice API to help in the implementation of equals and hashCode. Those classes are EqualsBuilder and HashCodeBuilder. Both allow you to concatenate the fields you want to use in your comparison and have a way also to use reflection.
The answer is "it depends depends on the semantics of your data".
For example, you might internally store a field that can be derived (calculated) from the other fields. In which case, you don't need to compare the calculated value.
As a gross generalisation, anything that cannot be derived from other fields should be included.
This is fine - and probably a good thing to do. If you've identified equality as the accountIdentifier and the server being distinct and unique, then that's perfectly valid for your use case.
You don't want to use more fields than you need to since that would produce false positives in your code. This approach is perfectly suitable to your needs.

what is the best way to compare two complex java object and generate event depending upon comparision

I have a requirement to compare two complex object, e:g
Policy{
Private Vehicle-information info1;
private Driver-information info2;
...
}
I have two populated instance of this class. I want to compare those instance and depending upon difference I need to show them in UI marked in colors using some flag.
What is the best way to compare these objects. Can we achieve it using XML because java code will be complex.
Override the equals() and hashCode() method in your Policy class. Then you can check for equality like:
if(object1.equals(object2)) {
// do something
}
Implement Comparable and override the compareTo() method if you need to order the objects.
One solution: use Jackson to serialize your objects as JSON, then use this: it is a Java implementation of JSON Patch which also can generate differences between two JSONs as JSON Patches.
Which means you can know what has changed and where. And since this is JSON, you can send the result to your browser and have it handled by some JavaScript code easily. Unlike XML!

Java classes with dynamic fields

I'm looking for clever ways to build dynamic Java classes, that is classes where you can add/remove fields at runtime. Usage scenario: I have an editor where users should be able to add fields to the model at runtime or maybe even create the whole model at runtime.
Some design goals:
Type safe without casts if possible for custom code that works on the dynamic fields (that code would come from plugins which extend the model in unforeseen ways).
Good performance (can you beat HashMap? Maybe use an array and assign indexes to the fields during setup?)
Field "reuse" (i.e. if you use the same type of field in several places, it should be possible to define it once and then reuse it).
Calculated fields which depend on the value of other fields
Signals should be sent when fields change value (no necessarily via the Beans API)
"Automatic" parent child relations (when you add a child to a parent, then the parent pointer in the child should be set for "free").
Easy to understand
Easy to use
Note that this is a "think outside the circle" question. I'll post an example below to get you in the mood :-)
Type safe without casts if possible for custom code that works on the dynamic fields (that code would come from plugins which extend the model in unforeseen ways)
AFAIK, this is not possible. You can only get type-safety without type casts if you use static typing. Static typing means method signatures (in classes or interfaces) that are known at compile time.
The best you can do is have an interface with a bunch of methods like String getStringValue(String field), int getIntValue(String field) and so on. And of course you can only do that for a predetermined set of types. Any field whose type is not in that set will require a typecast.
The obvious answer is to use a HashMap (or a LinkedHashMap if you care for the order of fields). Then, you can add dynamic fields via a get(String name) and a set(String name, Object value) method.
This code can be implemented in a common base class. Since there are only a few methods, it's also simple to use delegation if you need to extend something else.
To avoid the casting issue, you can use a type-safe object map:
TypedMap map = new TypedMap();
String expected = "Hallo";
map.set( KEY1, expected );
String value = map.get( KEY1 ); // Look Ma, no cast!
assertEquals( expected, value );
List<String> list = new ArrayList<String> ();
map.set( KEY2, list );
List<String> valueList = map.get( KEY2 ); // Even with generics
assertEquals( list, valueList );
The trick here is the key which contains the type information:
TypedMapKey<String> KEY1 = new TypedMapKey<String>( "key1" );
TypedMapKey<List<String>> KEY2 = new TypedMapKey<List<String>>( "key2" );
The performance will be OK.
Field reuse is by using the same value type or by extending the key class of the type-safe object map with additional functionality.
Calculated fields could be implemented with a second map that stores Future instances which do the calculation.
Since all the manipulation happens in just two (or at least a few) methods, sending signals is simple and can be done any way you like.
To implement automatic parent/child handling, install a signal listener on the "set parent" signal of the child and then add the child to the new parent (and remove it from the old one if necessary).
Since no framework is used and no tricks are necessary, the resulting code should be pretty clean and easy to understand. Not using String as keys has the additional benefit that people won't litter the code with string literals.
So basically you're trying to create a new kind of object model with more dynamic properties, a bit like a dynamic language?
Might be worth looking at the source code for Rhino (i.e. Javascript implemented in Java), which faces a similar challenge of implementing a dynamic type system in Java.
Off the top of my head, I suspect you will find that internal HashMaps ultimately work best for your purposes.
I wrote a little game (Tyrant - GPL source available) using a similar sort of dynamic object model featuring HashMaps, it worked great and performance was not an issue. I used a few tricks in the get and set methods to allow dynamic property modifiers, I'm sure you could do the same kind of thing to implement your signals and parent/child relations etc.
[EDIT] See the source of BaseObject how it is implemented.
You can use the bytecode manipulation libraries for it. Shortcoming of this approach is that you need to do create own classloader to load changes in classes dynamically.
I do almost the same, it's pure Java solution:
Users generate their own models, which are stored as JAXB schema.
Schema is compiled in Java classes on the fly and stored in
user jars
All classes are forced to extend one "root" class, where you could put every extra functionality you want.
Appropriate classloaders are implemented with "model change"
listeners.
Speaking of performance (which is important in my case), you can hardly beat this solution. Reusability is the same of XML document.

Categories

Resources