I guess this question is less technical and more philosophical.
I'm writing some Java classes that act as containers of data that could be imported from a variety of formats, including JSON and SQL result sets, both of which could contain null values. These classes share two methods, isNumeric() and isAlphanumeric().
How should these methods treat null values?
Are null values inherently numeric? Are they alphanumeric? I can come up with at least plausible justifications either way. I guess I'm just interested in hearing what the community's opinion might be.
This describes how null is defined in Oracle-
http://www.dbasupport.com/forums/showthread.php?8666-What-is-the-data-type-of-NULL
Null values are neither numeric nor alphanumeric - rather, they are an absence of value (and type).
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
In Java apps, I prefer to use unique fields in equals and hashCode methods instead of adding only id field or all the fields. However, I am confused about the following points:
By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except from id field)?
Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only add some numeric field instead of adding text fields?
JPA and Hibernate don't specify or rely on any particular semantics for entities' equals() and hashCode() methods, so you can do what you want.
Good alternatives
With that said, there is a handful of alternatives for equality that make much more sense to me than any others:
Equality corresponds to object identity. This is of course the default provided by Object.equals(), and it can serve perfectly well for entities. OR
Equality corresponds to persistent identity. That is, entities are equal if and only if they have the same entity type and primary key. OR
Equality corresponds to (only) value equality. That is, equality of all corresponding persistent fields except the ID. There are additional variations around how that applies to mapped relationships. OR
Equality corresponds to persistent identity AND value equality. Again, there are variations around how the value equality part applies to mapped relationships.
General advice
In general, you would do well to follow a fule rules of thumb:
As with most other classes, especially mutable ones, default to just inheriting Object.equals() and Object.hashCode(). Have a specific purpose and plan before you do otherwise, and remember that you get only one choice for this. And that it is impactful.
If you do override equals() (and therefore hashCode() as well) then do it in a consistent way across all your entities.
Think carefully before you go with an option involving value equality. This is usually a poor choice for mutable classes in general, and entities are no exception.
Specific Questions
1. By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
I think using the ID is fine. It's simply a question of what you want equality to represent for your entities. You absolutely can have distinct entity objects with the same type and ID, and you might want to be able to detect that with equals(). The other persistent fields might or might not factor into that.
In particular, an equals() method based solely on entity ID might make sense for entities that appear on the "many" side of a one-to-many relationship when that is mapped to a Set.
2. When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except
from id field)?
I see no good reason to consider only a proper subset of unique fields, except that subset consisting only of the entity ID. Or if all the fields are unique then the one consisting of all the fields except the ID. The logic that suggests that you might be able to consider other proper subsets revolves around the persistent identity of the entity, which is completely and best represented by its ID.
3. Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only
add some numeric field instead of adding text fields?
If your sense of equality is to be based on entity value then I don't see how it makes much sense to omit any persistent fields except, possibly, the ID. Do not arbitrarily omit the ID -- it may very well be something you want to include. Again, it depends on what equals() is intended to mean for your entities.
That's a tricky question that hibernate itself doesn't have a clear answer on.
John Bollinger's answer covers your specific question, but there is some additional context about how to think about equality and hibernate that should help figure out what to do. After all, given that hibernate doesn't require you to do anything particular, you can do whatever you want, which leads to the obvious question: ... okay, so what should I do, then?
That question boils down to (using Person as an arbitrary example of a model class + associated table; furthermore, lets say the person table has a single unique ID that is generated (A random UUID or auto-sequenced integer value).
What does an instance of Person represent?
There are in broad strokes 2 answers:
It represents a person. A row in the person table also represents a person; these 2 things aren't related.
It represents a row in the person table.
It represents a state in my application, nothing more.
Even though these things sound quite similar, they result in opposite meanings as to equality.
Which choice is correct? That's up to you.
When reading on, remember:
Any Person instance which isn't "saved" yet, would have a null value for id, because upon insertion, hibernate will ask the DB to generate a value for it or generates one itself and only then fills it in.
An instance represents a row
Equality under the second model (an instance of Person represents a row in the table) should look only at the id column, because that defines row uniqueness; any 2 representations of a row in the person table are guaranteed to be referring to the same row (hence, equal) if and only if the id is equal. That is a necessary and sufficient condition: If they are equal the 2 objects are necessarily referring to the same row, and if they aren't equal, then they are necessarily referring to different rows.
Notably, if id is still null, then they cannot be equal, not even to themselves: More generally the question: "Is this object-representing-a-row equal to this other object-representing-a-row" is a meaningless question if these objects are representing rows-to-be (unsaved rows). If you invoke save() on each object, you end up with 2 rows. Optimally such an object should be considered in a state such that attempting to invoke equals on it is a failure, but the spec of equals states that they can't throw, therefore, false is the best answer. This would mean you want:
class Person {
// fields
#Override public boolean equals(Object other) {
if (other == this) return true;
if (other == null || other.getClass() != Person.class) return false;
UUID otherId = ((Person) other).id;
return id == null ? false : id.equals(otherId);
}
}
This defines your equals method as 'ends up representing the same row'. This holds even if you change meaningful state:
Change the name and save the object? It's... still the same row, and this equality implementation reflects this.
Call save() on each in the comparison when they were unsaved? Then you get 2 rows - and this equality implementation reflects this before and after attempting to save it.
If invoking on self (a.equals(a)) this returns true as the equality spec demands; it also works out in the 'modelling a row' view: If you invoke save() on the same object twice, it's still just one row.
An instance represents a person
The nature of what a person is is entirely unrelated to the autosequence/autogen ID it gets; the fact that we're using hibernate is an implementation detail that should play no part at all in considering equality; after all, this object represents the notion of a person, and that notion exists entirely independent of the database. The database is one thing that is modelling persons; instances of this class are another.
In this model you should do the exact opposite: Find something that uniquely identifies a person itself, and compare against that. After all, if you have 2 rows in a database that both contain the same social security number, then you have only 1 person.. and you just happen to have 2 rows that are both referring to the same person. Given that we chose our instance to imply that it represents a person, then an instance loaded from row A, and an instance loaded from row B, ought to be considered as equal - after all, they are representing the same individual.
In this case, you write an equals method that considers all relevant fields except the autoseq/autogen ID field! If there is a separate unique id such as social security number, use that. If there isn't, essentially it boils down to an equals method that compares all fields, except ID. Because that's the one field that definitely has zero bearing on what defines a person.
An instance defines a state in your application
This is almost a cop-out, and in general means equality is irrelevant / not applicable. It's like asking how to implement an equals method to an InputStream implementation - mostly, you.. don't.
Here, the default behaviour (Object's own impls) are what you want, and therefore, you don't implement either hashCode or equals. Any instance of Person is equal to itself (as in, a.equals(a), same reference), and not equal to any other, even if the other has identical values for each and every field, even the id field isn't null (represents the same row).
Such an object cannot meaningfully be used as a value object. For example, it would be pointless to stuff such things in a hashmap (at best, you can stuff them in an IdentityHashMap, as those semantics would apply. Only way to do any lookups is to have a ref that was .put() into it before and call .get() with that).
Which one is right? Up to you. But document it clearly, because in my experience, lots of hibernate users are absolutely convinced either the first or second model is the one, and only, right answer, and consider the other answer utterly bonkers. This is problematic - they'd be writing their code assuming all hibernate model classes work precisely as they want, and would therefore not even be thinking of checking docs/impl to know how it actually works.
For what its worth, objects are objects and database rows do not neatly map to the notion of an object. SQL's and java's notion of null are utterly incompatible, and the notion of 'a query' does not neatly map to tables (between selecting expressions, selecting on views, and JOINs, that should be obvious) - hibernate is tilting at windmills. It is a leaky abstraction and this is one of its many, many leaks. Leaky abstractions can be useful, just, be aware that at the 'edges' the principle hibernate tries to peddle you (that objects can represent query results and rows) has limits you will run into. A lot.
This is either a Java coding design question, or a domain modelling question, I'm not sure yet!
Background, simplified as I can't share all the detail:
The system I'm working on is deployed in multiple instances (e.g. at different customers, and at one customer there may be development, test, preprod, prod instances)
The system configuration is a list of ConfigArtefact<T>, where T indicates that it might be a database connection configuration, or a predefined-query, or....
ConfigArtefacts are named. The names are semantically meaningful/well-known (e.g there could be an object for "Console.translations.en", "Console.translations.fr") or "Application.Database.connection.credentials", "Reporting.Database.connection.credentials") and are distinct for each deployment- no two different ConfigArtefacts will have the same name for a given deployment
ConfigArtefacts have other attributes (e.g. for the database, username and password) depending on the type used for <T>. The value of the attributes could be different in different deployments of this system.
There's no natural ordering of artefacts, even ones of the same type <T>. Where some arbitrary ordering is needed, I use the name.
Goal:
I need to write something that compares the configuration of two deployments of this system and identify Artifacts that have been added, removed, or changed. In order to find the same artefact on each deployment, I need to compare by name only (I always know what type of artefact I'm working with). In order to say if they've changed, I need to compare by all other attributes.
So, two kinds of comparison. One can be modelled with equals/hashcode, but not the other. Which should use equals()? (I think the one by name, as then added and deleted are just set subtraction, using one of the many collection libraries).
Would that be the normal choice? And if so, is there a conventional name for the other ("full compare") one? I'm considering identicalTo() (so two objects are changed if one.identicalTo(two) is false)
Your design is flawed - you have misused equals() by only comparing name.
If all attributes must be compared to know if the object has changed, then it is not true that objects with the same name are “equal”, because the use of the word “changed” implies there’s a difference, and if there’s a difference they’re not equal.
Finding something by using an identifier is different to two objects being equal if they have the same identifier.
Implement equals() and hashCode() using all attributes that matter for determining if an object is “different”.
To facilitate convenience and performance, populate a Map<String, ConfigArtefact<?>> for each environment using the name as the key.
Finding differences between 2 such maps is a fairly trivial O(n) task.
Generally I've been quite a fan of using List.of / Arrays.asList, but unfortunately they don't accept nulls. The usecase I most often stumble upon this, is when dealing with DB calls, and parameters are abstracted to be passed via list. So it always have to be of certain length for any given procedure, and optional values are given as nulls.
I found Why does Map.of not allow null keys and values? which mentions List.of, but mainly talks about maps. And in the case of maps, I agree -- ambiguity of .get seems troublesome (key missing, or value intentionally null?), but this doesn't seem to apply to a list. If you get a null from a list, then you know for sure someone intentionally had put it there.
Some people might say "use optional", but I strongly believe it should be reserved for return types, and others seem to agree Uses for Optional
having an Optional in a class field or in a data structure, is considered a misuse of the API
What is the intended clean, standard solution? Does it really boil down to "use boilerplaty ArrayList initialization without these nice shorthand syntaxes" or "write your own util methods"?
Generally I've been quite a fan of using List.of / Arrays.asList, but unfortunately they don't accept nulls
This is incorrect:
List<String> list = Arrays.asList("a", "b", null);
String nullRef = list.get(2);
System.out.println(nullRef);
Works fine. List.of indeed doesn't accept null refs.
Some people might say "use optional"
Yeah, ignore those people.
What is the intended clean, standard solution?
Clean is a weaselword that is generally best read as: "I like it personally but am unwilling to just say that it's a taste preference" - it means nothing. You can't argue about taste, after all.
The standard solution, that is borderline objective language, we can work with that.
I think your API design that takes a list of db values is the problem. Presumably, you've written your own abstraction layer for this.
I'm having a hard time imaginine DB abstraction APIs where 'list of objects that includes null refs' is the right call.
If you're trying to weave prepared statement 'parameters', this would appear to be a nice API for that, and doesn't require lists at all:
db.select()
.append("SELECT a, b FROM c INNER JOIN d ON c.e = d.f ")
.append("WHERE (signup IS NULL OR signup < ?) ", LocalDate.now())
.append("AND LEFT(username, ?) = ?", 2, "AA")
.query()
If you're trying to abstract an insert/merge/upsert statement, you need some way of linking values with the column name that the value is a value for. This would suggest an API along the lines of:
db.insert("tableName")
.mergeOn("key", key)
.mergeOn("date", someDate)
.put("value", value)
.exec();
You can also re-use Map.of for this purpose, in which case 'I want the db to use the default value for this column' is done by simply not including that key/value pair in your map, thus sidestepping your needs to put null in such a map.
Perhaps if you show your specific API use-case, more insights can be provided.
If your question boils down to:
"I want to put null refs and use List.of, what is the standard clean way to do that"
then the answer is a rather obvious: There is nothing - as that does not work.
There is no --shut-up-List.of--let-me-add-nulls command line switch.
I was looking for a nice-looking way to extract a nullable value from a ResultSet, as opposite to wasNull. Documentation (API spec) guarantees that "untyped" getObject (returning Object) returns null when the column contains NULL, making the following construct valid: (Boolean)rs.getObject(1).
Naturally, I would prefer to do rs.getObject(1, Boolean.class) - but, for some reason, for the typed overload no such guarantees are provided.
Or are they? Maybe I am just missing something? I found nothing for this particular oveload at here , and JDBC spec seems to overlook such overloads entirely.
So, do NULLs get converted to nulls by rs.getObject(1, Boolean.class), or don't they, or is that implementation-dependent, or maybe it depends on some custom conversions defined in an application?
When querying a database, why aren't the values returned as the same accurate Java types they were put in as? i.e. why aren't database query results generally type safe?
What you are speaking about is just a part of a bigger problem: object-relational impedance mismatch.
The short answer is that it's not always possible to safely convert database type to a corresponding Java type. For example, even integer type specifications don't always match between programming languages and databases (like, should be int 32-bit or 64-bit? Signed or unsigned?). The same goes for strings (ASCII or Unicode?). As #Marc B has already mentioned, things become even worse in case of more complex data types like date.