Can you describe the pros and cons of including an OID (typically a database row identifier) in a POJO representing an entity in your model?
In fact I'm not talking about issues related to equals/hashcode and so on, I should have described better my problem (my bad :) )...
We've got some of those entity classes which represent business objects (like Product, Catalog and so on...). Sometime they have a 'business id', for example Product can be found by its unique ProductId (which has 3 fields : id, type, repository).
In our database, the Product table has a surrogate primary key column (OID) in addition to the 3 business columns ( id, type, repository) to facilitate foreign keys references and to have less joins clauses.
The Product/ProductId classes are part of the API that we expose to other applications. So for example they can call :
productManager.findProductById(ProductId productId);
The question is, should or should not the OID be included in the Product or in the ProductId class knowing that our clients are expected to use the ProductId identifier.
Pros :
I can use the OID to do another lookup like
Product p = productManager.findProductById(ProductId productId);
Catalog c = productManager.findAllCatalogsContainingProduct(p.getOid());
We're used to lookup a lot in the application by ProductId so this saves each time a roundtrip to the database to avoid to find the OID matching a ProductId.
Cons :
I've just exposed the OID to a client (let's hope he doesn't use it instead of the business key!!)
Can you list other pros and cons?
Database row identifier = Primary key? If so, there is no pro or con, you have to have it otherwise you can't relate the POJO back to its corresponding database row.
To retrieve Products and Catalogs, the standard SQL way is to do a Join. For example, with my DAL I can do:
SearchCriteria sc = new SearchCriteria();
sc.AddBinding("ProductId", productId);
List<Entity> Products = SQL.Read(sc, new Product(new Catalog());
or
List<Entity> Products = SQL.Read(sc, new Catalog(new Product());
This way there is no need to reveal anything to the caller, nor for a roundtrip.
You can run into problems if your implementation of equals() or hashCode() is based off the identifier since it will likely be null initially and then change later once the object is persisted. See below:
http://java.sun.com/javase/6/docs/api/java/util/Set.html
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set. A special case of this prohibition is that it is not permissible for a set to contain itself as an element.
Let's assume that your implementation of hashCode() is based off the identifier and equals() uses hashCode() in its comparison. If you add the object to a Set and its identifer is null the equals comparisons will perform one way. If you then persist the object in the set, its identifier value will likely change, thus changing the behavior of equals() and hashCode(). This breaks the "contract" of Set as described above.
It's a bit of an edge case but one worth noting.
Related
Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed last month.
Improve this question
In Java apps, I prefer to use unique fields in equals and hashCode methods instead of adding only id field or all the fields. However, I am confused about the following points:
By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except from id field)?
Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only add some numeric field instead of adding text fields?
JPA and Hibernate don't specify or rely on any particular semantics for entities' equals() and hashCode() methods, so you can do what you want.
Good alternatives
With that said, there is a handful of alternatives for equality that make much more sense to me than any others:
Equality corresponds to object identity. This is of course the default provided by Object.equals(), and it can serve perfectly well for entities. OR
Equality corresponds to persistent identity. That is, entities are equal if and only if they have the same entity type and primary key. OR
Equality corresponds to (only) value equality. That is, equality of all corresponding persistent fields except the ID. There are additional variations around how that applies to mapped relationships. OR
Equality corresponds to persistent identity AND value equality. Again, there are variations around how the value equality part applies to mapped relationships.
General advice
In general, you would do well to follow a fule rules of thumb:
As with most other classes, especially mutable ones, default to just inheriting Object.equals() and Object.hashCode(). Have a specific purpose and plan before you do otherwise, and remember that you get only one choice for this. And that it is impactful.
If you do override equals() (and therefore hashCode() as well) then do it in a consistent way across all your entities.
Think carefully before you go with an option involving value equality. This is usually a poor choice for mutable classes in general, and entities are no exception.
Specific Questions
1. By considering object states in Hibernate, I think it is good practice not using id field in equals and hashCode methods, right?
I think using the ID is fine. It's simply a question of what you want equality to represent for your entities. You absolutely can have distinct entity objects with the same type and ID, and you might want to be able to detect that with equals(). The other persistent fields might or might not factor into that.
In particular, an equals() method based solely on entity ID might make sense for entities that appear on the "many" side of a one-to-many relationship when that is mapped to a Set.
2. When there is a unique field in a class, is it enough to use only one of the unique fields in equals and hashCode methods (except
from id field)?
I see no good reason to consider only a proper subset of unique fields, except that subset consisting only of the entity ID. Or if all the fields are unique then the one consisting of all the fields except the ID. The logic that suggests that you might be able to consider other proper subsets revolves around the persistent identity of the entity, which is completely and best represented by its ID.
3. Should I add all the fields except from id field when there is not any unique field except from id field in a class? Or should I only
add some numeric field instead of adding text fields?
If your sense of equality is to be based on entity value then I don't see how it makes much sense to omit any persistent fields except, possibly, the ID. Do not arbitrarily omit the ID -- it may very well be something you want to include. Again, it depends on what equals() is intended to mean for your entities.
That's a tricky question that hibernate itself doesn't have a clear answer on.
John Bollinger's answer covers your specific question, but there is some additional context about how to think about equality and hibernate that should help figure out what to do. After all, given that hibernate doesn't require you to do anything particular, you can do whatever you want, which leads to the obvious question: ... okay, so what should I do, then?
That question boils down to (using Person as an arbitrary example of a model class + associated table; furthermore, lets say the person table has a single unique ID that is generated (A random UUID or auto-sequenced integer value).
What does an instance of Person represent?
There are in broad strokes 2 answers:
It represents a person. A row in the person table also represents a person; these 2 things aren't related.
It represents a row in the person table.
It represents a state in my application, nothing more.
Even though these things sound quite similar, they result in opposite meanings as to equality.
Which choice is correct? That's up to you.
When reading on, remember:
Any Person instance which isn't "saved" yet, would have a null value for id, because upon insertion, hibernate will ask the DB to generate a value for it or generates one itself and only then fills it in.
An instance represents a row
Equality under the second model (an instance of Person represents a row in the table) should look only at the id column, because that defines row uniqueness; any 2 representations of a row in the person table are guaranteed to be referring to the same row (hence, equal) if and only if the id is equal. That is a necessary and sufficient condition: If they are equal the 2 objects are necessarily referring to the same row, and if they aren't equal, then they are necessarily referring to different rows.
Notably, if id is still null, then they cannot be equal, not even to themselves: More generally the question: "Is this object-representing-a-row equal to this other object-representing-a-row" is a meaningless question if these objects are representing rows-to-be (unsaved rows). If you invoke save() on each object, you end up with 2 rows. Optimally such an object should be considered in a state such that attempting to invoke equals on it is a failure, but the spec of equals states that they can't throw, therefore, false is the best answer. This would mean you want:
class Person {
// fields
#Override public boolean equals(Object other) {
if (other == this) return true;
if (other == null || other.getClass() != Person.class) return false;
UUID otherId = ((Person) other).id;
return id == null ? false : id.equals(otherId);
}
}
This defines your equals method as 'ends up representing the same row'. This holds even if you change meaningful state:
Change the name and save the object? It's... still the same row, and this equality implementation reflects this.
Call save() on each in the comparison when they were unsaved? Then you get 2 rows - and this equality implementation reflects this before and after attempting to save it.
If invoking on self (a.equals(a)) this returns true as the equality spec demands; it also works out in the 'modelling a row' view: If you invoke save() on the same object twice, it's still just one row.
An instance represents a person
The nature of what a person is is entirely unrelated to the autosequence/autogen ID it gets; the fact that we're using hibernate is an implementation detail that should play no part at all in considering equality; after all, this object represents the notion of a person, and that notion exists entirely independent of the database. The database is one thing that is modelling persons; instances of this class are another.
In this model you should do the exact opposite: Find something that uniquely identifies a person itself, and compare against that. After all, if you have 2 rows in a database that both contain the same social security number, then you have only 1 person.. and you just happen to have 2 rows that are both referring to the same person. Given that we chose our instance to imply that it represents a person, then an instance loaded from row A, and an instance loaded from row B, ought to be considered as equal - after all, they are representing the same individual.
In this case, you write an equals method that considers all relevant fields except the autoseq/autogen ID field! If there is a separate unique id such as social security number, use that. If there isn't, essentially it boils down to an equals method that compares all fields, except ID. Because that's the one field that definitely has zero bearing on what defines a person.
An instance defines a state in your application
This is almost a cop-out, and in general means equality is irrelevant / not applicable. It's like asking how to implement an equals method to an InputStream implementation - mostly, you.. don't.
Here, the default behaviour (Object's own impls) are what you want, and therefore, you don't implement either hashCode or equals. Any instance of Person is equal to itself (as in, a.equals(a), same reference), and not equal to any other, even if the other has identical values for each and every field, even the id field isn't null (represents the same row).
Such an object cannot meaningfully be used as a value object. For example, it would be pointless to stuff such things in a hashmap (at best, you can stuff them in an IdentityHashMap, as those semantics would apply. Only way to do any lookups is to have a ref that was .put() into it before and call .get() with that).
Which one is right? Up to you. But document it clearly, because in my experience, lots of hibernate users are absolutely convinced either the first or second model is the one, and only, right answer, and consider the other answer utterly bonkers. This is problematic - they'd be writing their code assuming all hibernate model classes work precisely as they want, and would therefore not even be thinking of checking docs/impl to know how it actually works.
For what its worth, objects are objects and database rows do not neatly map to the notion of an object. SQL's and java's notion of null are utterly incompatible, and the notion of 'a query' does not neatly map to tables (between selecting expressions, selecting on views, and JOINs, that should be obvious) - hibernate is tilting at windmills. It is a leaky abstraction and this is one of its many, many leaks. Leaky abstractions can be useful, just, be aware that at the 'edges' the principle hibernate tries to peddle you (that objects can represent query results and rows) has limits you will run into. A lot.
I have a few types that have a common field(Email ID) that I am using as an #Id. These types extend from a common type User which has the Email ID field. It is something like below:
#Entity
class User{
#Id
String emailID;
}
#Entity
#Subclass(index = true)
class UserType1 extends User{
String otherField;
}
#Entity
#Subclass(index = true)
class UserType2 extends User{
String otherField;
}
Now, I want that every time I insert a subtype of User, the Email ID should remain unique across all these subtypes objects in the datastore. I tested an endpoint for the above types by inserting each of subtypes with the same EmailID and it happened successfully - Objectify shouldn't have allowed the persistence of subtypes with the same ID. As per my understanding, the ultimate uniqueness is ensured by the keys but can't I ensure uniqueness by an Id across just the subtypes especially when ID is in the base class? Is there some way to do it?
EDIT:
Although, this is not the solution I was looking for, I have handled this situation by creating a new entity type with {EmailID, Key_Subtype} which worked in ensuring the uniqueness. I just check this entity for existing emailID and I use the key for retrieving the object with another query.
If anyone comes off with a better solution, I would appreciate it.
UUID is that what you are looking for. It is generated for each entity. Type does not matter.
https://dzone.com/articles/hibernate-and-uuid-identifiers
Same Id for different entity types is definitely possible at the datastore level, see re-using an entity's ID for other entities of different kinds - sane idea?
The Id uniqueness is only guaranteed across entities of the same kind and with the same parent entity (the unique entity key is based on a combination of these 3 items). Since your subtypes are actually different entity kinds there is no problem having the same Id across these kinds, so subclassing is not the way to achieve what you want.
To have unique Ids you need to have a unique entity kind, say User. To distinguish the different user types maybe have inside User a type property which would be a reference to a entity of UserTypeX kind containing the info specific to that user type?
It sounds like you have found the "correct" solution - create an Email entity that uses the email address as the id and contains a pointer to the appropriate User entity. When creating a new User/Email, always check for pre-existence of the email address in a transaction.
This really isn't any different from using the email address as the id of the User directly except that the extra layer of indirection allows users to change their email addresses, which is generally a good idea. The transactional logic is similar either way.
Transactionally looking up & creating an entity with a natural primary key is pretty much the only way of guaranteeing uniqueness in the datastore. It is effective and scalable.
The recommended way of using merge() is to first get the DTO first before inputting the changes.
public void merge(PersonModel model) {
Person inputDTO = PersonBuilder.build(model)
Person dto = get(pk)
dto.setName(inputDTO.getName())
dto.getChildren().clear()
Iterator<Child> iter = inputDTO .getChildren().Iterator();
while(iter.hasNext()){
dto.getChildren().add(iter.next());
}
dto.merge();
}
Is there a more elegant way of performing such operation translating domain model to dto and merging it so that no data are accidentally deleted.
Example of problem:
Hibernate: prevent delete orphan when using merge();
I find the need to clear the list and adding it very wasteful.
Can someone recommend me a design pattern or a way to code it properly?
Thank you
ADD ON:
1) Possible to use Hibernate Hashset to replace List? Will hibernate hashset replace elements base on primary keys?
any help?
"The recommended way of using merge() is to first get the DTO first before inputting the changes"
Who recommended you to do this?
"Is there a more elegant way of performing such operation translating domain model to dto and merging it so that no data are accidentally deleted."
I don't think you can translate domain objects to DTOs. A DTO is just about data, a domain object is data, behaviour and context. Completely different.
If you don't have behaviour and context in your domain objects (a.k.a. anemic domain model), you don't need an extra DTO layer that just duplicates the objects.
Because you tagged this question with Hibernate and mentioned it in your question, you don't need to call merge yourself because you just got the object from the database and Hibernate will flush the session to synchronize the changes with the database.
"Possible to use Hibernate Hashset to replace List? Will hibernate hashset replace elements base on primary keys?"
I would replace the List with a Hashset, since the table where the data is going to be stored is a set, not a list (you can't have duplicate records). A hashset will not replace elements based on primary keys. A set (any set, Hibernate's implementation is no different) works by preventing duplicates. It uses your equals() and getHashCode() implementation to find out if there is already an object in that set. If that is the case, it won't be added but it keeps the original.
I hit a problem when writing tests for a database application using JPA2 and EclipseLink:
I add some entity to a database, retrieve it later and want to compare it to an instance which has the values I expect to confirm that the addition worked as I intended.
First I wrote something like
assertEquals(expResult, dbResult);
which failed, because I can't really know the value of id field, which is generated by the database and therefore dbResult differs from expResult which I created with new and populated manually.
I see two options:
Either I remove the id field from equals and hashCode so that the comparison is only based on the "real values". I don't know if this causes problems in the database or elsewhere, though.
Or I write my tests to explicitly check every field except id manually.
What should I do?
You might find a lot of controversy about this one. My stance is that you absolutely don't use a database primary key for anything in your application. It should be completely invisible. Identify your objects in your application by some other property or combination of properties.
On the "testing persistence operations" front, what you really want is probably to check that the fields were saved and loaded correctly and maybe that the primary key got assigned some value when you saved it. This probably isn't a job for the equals method at all.
Relying on database generated Ids in your equals and hashCode implementation is not advisable. You ought to rely on the truly unique/semi-unique attributes of your classes in checking for equality, and in generating the hashcode values. The Hibernate documentation has an extensive page that discusses this, and the facts therein are applicable to more or less every JPA provider.
The underlying reason for using business keys over database generated values in your equals and hashCode implementation is that the JPA provider must actually issue a SELECT after persisting the entity in the database. If you compare objects using the database generated Ids, then you will end up having an equality test that fails in the following scenarios:
If E1 and E2 are entities of class E (that verifies equality using database generated Ids), then if E1 and E2 will be equal if they haven't been stored in the database yet. This is not what you want, especially if want to store E1 and E2 in some Set before persistence. This is worse if the attributes of E1 and E2 possess different values; the equals implementation would prevent two significantly different entities from being added to a Set, and the hashCode implementation will give you a O(n) lookup time when entities are looked up from a HashMap using the primary key.
If E1 is a managed entity that has been persisted, and E2 is an entity that has not been persisted, then the equality test would deem that E1 != E2 in the scenario where all the attribute values of E1 and E2 (except for the Ids) are similar. Again, this is probably not what you want, especially if you want to avoid duplicate entities in the database that differ only in their database generated Ids.
The equals and hashCode implementations therefore ought to use business keys, in order to exhibit consistent behavior for both persisted and unpersisted entities.
From the book Hibernate in Action, its recommended to defined a business key and test equality on that. A business key is "a property, or some combination of properties, that is unique for each instance with the same database identity." In other areas it says to not use the id as one of those properties, and don't use values in collections.
I would write my test to explicitly check for fields. To make this easy, before performing the assertEqual test, I will set the id of both the expected and actual result to the same predefined value and then use the normal equals method.
Removing ID from equals is not justifiable, just because testing is slightly difficult. You are foregoing serious performance benefits and also code integrity.
In legacy database tables we have numbered columns like C1, C2, C3, C100 or M1, M2, M3, M100.
This columns represent BLOB data.
It is not possible to change anything it this database.
By using JPA Embeddable we map all of the columns to single fields. And then during embedding we override names by using 100 override annotations.
Recently we have switched to Hibernate and I've found things like UserCollectionType and CompositeUserType. But I hadn't found any use cases that are close to mine.
Is it possible to implement some user type by using Hibernate to be able to map a bundle of columns to a collection without additional querying?
Edit:
As you probably noticed the names of columns can differ from table to table. I want to create one type like "LegacyArray" with no need to specify all of the #Columns each time I use this type.
But instead I'd use
#Type(type = "LegacyArrayUserType",
parameters =
{
#Parameter(name = "prefix", value = "A"),
#Parameter(name = "size", value = "128")
})
List<Integer> legacyA;
#Type(type = "LegacyArrayUserType",
parameters =
{
#Parameter(name = "prefix", value = "B"),
#Parameter(name = "size", value = "64")
})
List<Integer> legacyB;
I can think of a couple of ways that I would do this.
1. Create views for the collection information that simulates a normalized table structure, and map it to Hibernate as a collection:
Assuming your existing table is called primaryentity, I would create a view that's similar to the following:
-- untested SQL...
create view childentity as
(select primaryentity_id, c1 from primaryentity union
select primaryentity_id, c2 from primaryentity union
select primaryentity_id, c3 from primaryentity union
--...
select primaryentity_id, c100 from primaryentity)
Now from Hibernate's perspective, childentity is just a normalized table that has a foreign key to primarykey. Mapping this should be pretty straight forward, and is covered here:
http://docs.jboss.org/hibernate/stable/core/reference/en/html/collections.html
The benefits of this approach:
From Hibernate's point of view, the tables are normalized, it's a fairly simple mapping
No updates to your existing tables
The drawbacks:
Data is read-only, I don't think your view can be defined in an updatable manner (I could be wrong)
Requires change to the database, you may need to create lots of views
Alternately, if your DBA won't even let you add a view to the database, or if you need to perform updates:
2. Use Hibernate's dynamic model mapping facility to map your C1, C2, C3 properties to a Map, and have some code you your DAO layer do the appropriate conversation between the Map and the Collection property:
I have never done this myself, but I believe Hibernate does allow you to map tables to HashMaps. I'm not sure how dynamically Hibernate allows you to do this (i.e., Can you get away with simply specifying the table name, and having Hibernate automatically map all the columns?), but it's another way I can think of doing this.
If going with this approach though, be sure to use the data access object pattern, and ensure that the internal implementation (use of HashMaps) is hidden from the client code. Also be sure to check before writing to the database that the size of your collection does not exceed the number of available columns.
The benefits of this approach:
No change to the database at all
Data is updatable
O/R Mapping is relatively simple
The drawbacks:
Lots of plumbing in the DAO layer to map the appropriate types
Uses experimental Hibernate features that may change in the future
Personally, I think that design sounds like it breaks first normal form for relational databases. What happens if you need C101 or M101? Change your schema again? I think it's very intrusive.
If you add Hibernate to the mix it's even worse. Adding C101 or M101 means having to alter your Java objects, your Hibernate mappings, everything.
If you have 1:m relationships with C and M tables, you'd be able handle the cases I just cited by adding additional rows. Your Java objects contain Collection<C> or Collection<M>. Your Hibernate mappings are one-to-many that don't change.
Maybe the reason that you don't see any Hibernate examples to match your case because it's a design that's not recommended.
If you must, maybe you should look at Hibernate Component Mapping.
UPDATE: The fact that this is legacy is duly noted. My point in bringing up first normal form is as much for others who might find this question in the future as it is for the person who posted the question. I would not want to answer the question in such a way that it silently asserted this design as "good".
Pointing out Hibernate component mapping is pertinent because knowing the name of what you're looking for can be the key when you're searching. Hibernate allows an object model to be finer grained than the relational model it maps. You are free to model a denormalized schema (e.g., Name and Address objects as part of a larger Person object). That's just the name they give such a technique. It might help find other examples as well.
Sorry if I'm misunderstanding your problem here, I don't know much about Hibernate. But couldn't you just concatenate during selection from database to get something like what you want?
Like:
SELECT whatever
, C1||C2||C3||C4||...||C100 AS CDATA
, M1||M2||M3||M4||...||M100 AS MDATA
FROM ...
WHERE ...
(Of course, the concatenation operator differs between RDBMSs.)
[EDIT] I suggest to use a CompositeUserType. Here is an example. There is also a good example on page 228f in the book "Java Persistence With Hibernate".
That allows you to handle the many columns as a single object in Java.
The mapping looks like this:
#org.hibernate.annotations.Columns(columns = {
#Column(name="C1"),
#Column(name="C2"),
#Column(name="C3"),
...
})
private List<Integer> c;
Hibernate will load all columns at once during the normal query.
In your case, you must copy the int values from the list into a fixed number of columns in nullSafeSet. Pseudocode:
for (int i=1; i<numColumns; i++)
if (i < list.size())
resultSet.setInt(index+i, list.get(i));
else
resultSet.setNull(index+i, Hibernate.INTEGER.sqlType());
In nullSafeGet you must create a list and stop adding elements when a column is NULL. For additional safety, I suggest to create your own list implementation which doesn't allow to grow beyond the number of columns (inherit from ArrayList and override ensureCapacity()).
[EDIT2] If you don't want to type all the #Column annotations, use a code generator for them. That can be as simple as script which you give a name and a number and it prints #Column(...) to System.out. After the script ran, just cut&paste the data into the source.
The only other solution would be to access the internal Hibernate API to build that information at runtime but that API is internal, so a lot of stuff is private. You can use Java reflection and setAccessible(true) but that code probably won't survive the next update of Hibernate.
You can use UserTypes to map a given number of columns to any type you wish. This could be a collection if (for example) for collections are always bounded in size by a known number of items.
It's been a while (> 3 years) since I used Hibernate so I'm pretty rusty but I recall it being very easy to do; your BespokeUserType class gets passed the ResultSet to hydrate your object from it.
I too have never used Hibernate.
I suggest writing a small program in an interpreted language (such as Python) in which you can execute a string as if it were a command. You could construct a statement which takes the tedious work out of doing what you want to do manually.