Does it make sense to create a single entity when it should only contain the #Id value as a String?
#Entity
class CountryCode {
#Id
String letterCode; //GBR, FRA, etc
}
#Entity
class Payment {
CountryCode code;
// or directly without further table: String countryCode;
}
Or would you just use the letterCode as the stringvalue instead of creating the CountryCode entity?
It should later be possible for example to fetch all payments that contain a specific countrycode. This might be possible with both solutions. But which is the better one (why)?
Yes you can if you are using the entity as a lookup. In your example, you may want to add a column for description congaing (France, Great Britain, etc.) for the letter code and a third column whether it is active or not and maybe columns for when inserted and when it was last changed.
It makes sense to create such table to provide consistency of data, that is that no Payment is created with non-existing CountryCode. Having a separate entity (that is table) together with foreign key on Payment allows checking for consistency in database.
Another possible approach is to have check constraint on the code field but this is error prone if codes are added/deleted and/or there are more than one column of this type.
Adding the letterCode the the Payment Class as String Attribute (Or Enum to prevent typo errors) will increase the fetch performance as you do not need to create a join over your CountryCode Table.
Related
#Entity
public class Person {
#Id
#GeneratedValue(strategy = GenerationType.AUTO)
private int id;
private String name;
private String externalID; //<--- why we need this?
}
Someone has suggested me to include an external Id field in a class something like that? Any suggestions why that could be?
Not sure, what exactly meant by externalID here, since the case of usage is not clear.
But, I assume a couple of cases:
1. External service
External id may be used to map your entity with some id of another resource from different services. Something, that identifies this entity in another system.
For example: in externalID may be stored person twitter id or bank account id.
2. Security-wise
externalID is used to protect (encapsulate) internal id been visible outside, which may cause some security vulnerabilities.
For example:
In your case, internal id is Integer with GenerationType.AUTO, that means, all entities will have an incremental id: 1, 2, 3, ...
Knowing that someone may analyze your API calls and easily iterate through all your accounts via API, e.g: GET api/person/{id}.
Usually, a different type of IDs is used to solve this problem, like UUID, e.g.: 8b9af550-a4c7-4181-b6ba-1a1899109783. Which can be used as externalID in your case.
So, I assume this is the reason to add additional externalID to your entity.
Note: if your Database supports the usage of UUID (or store it as String), you can simply replace your internal id type with UUID and get rid of externalID here.
It is possible that externalID represents the Primary Key of another table that person is relative to. String is quite arbitrary though, you would generally use an Integer, Long, or UUID to represent a primary key. Might need more context in the question.
The purpose behind an external ID is to link your entity with another representation of it from a system that is decoupled from yours.
For example, if you want the store the Facebook ID for SSO reasons, you would do it through a field that could be called externalId, or something like that. Another example might be that you imported some accounts from another database, and you want to store the Primary Key from the source entity that has been imported.
Otherwise, if that field does not represent anything in your business logic, get rid of it.
Imagine that I have a simple entity as follows:
#Entity
#Table(name = "PERSON")
public class Person {
#Id
#Column(name = "NAME")
private String name;
#Column(name = "GENDER")
private String gender;
}
And two tables, the actual table holding the information and a lookup table.
TABLE PERSON (
NAME VARCHAR2 NOT NULL,
GENDER INT NOT NULL);
TABLE GENDER_LOOKUP (
GENDER_ID INT NOT NULL,
GENDER_NAME VARCHAR2 NOTNULL);
I want to save the information from my entity into the table, so that the String field gender is automatically converted to the corresponding gender int, using the lookup table as a reference. I thought of two approaches, but I was wondering if there was a more efficient way.
Create an enum and use ordinal enum to persist. I would rather avoid this because I'd like to have only one "source of truth" for the information and for various business reasons, it has to be a lookup table.
Use the #Converter annotation and write a custom converter. I think that this would require me to query the table to pull out the relevant row, so it would mean that I would have to make a JPA call to the database every time something was converted.
I'm currently planning to use 2, but I was wondering if there was any way to do it within the database itself, since I assume using JPA to do all of these operations has a higher cost than if I did everything in the database. Essentially attempt to persist a String gender, and then the database would look at the lookup table and translate it to the correct Id and save it.
I'm specifically using openJpa but hopefully this isn't implementation specific.
Since you seriously considered using enum, it means that GENDER_LOOKUP is static, i.e. the content doesn't change while the program is running.
Because of that, you should use option 2, but have the converter cache/load all the records from GENDER_LOOKUP on the first lookup. That way, you still only have one "source of truth", without the cost of hitting the database on every lookup.
If you need to add a new gender1, you'll just have to restart the app to refresh the cache.
1) These days, who know what new genders will be needed.
I have a few types that have a common field(Email ID) that I am using as an #Id. These types extend from a common type User which has the Email ID field. It is something like below:
#Entity
class User{
#Id
String emailID;
}
#Entity
#Subclass(index = true)
class UserType1 extends User{
String otherField;
}
#Entity
#Subclass(index = true)
class UserType2 extends User{
String otherField;
}
Now, I want that every time I insert a subtype of User, the Email ID should remain unique across all these subtypes objects in the datastore. I tested an endpoint for the above types by inserting each of subtypes with the same EmailID and it happened successfully - Objectify shouldn't have allowed the persistence of subtypes with the same ID. As per my understanding, the ultimate uniqueness is ensured by the keys but can't I ensure uniqueness by an Id across just the subtypes especially when ID is in the base class? Is there some way to do it?
EDIT:
Although, this is not the solution I was looking for, I have handled this situation by creating a new entity type with {EmailID, Key_Subtype} which worked in ensuring the uniqueness. I just check this entity for existing emailID and I use the key for retrieving the object with another query.
If anyone comes off with a better solution, I would appreciate it.
UUID is that what you are looking for. It is generated for each entity. Type does not matter.
https://dzone.com/articles/hibernate-and-uuid-identifiers
Same Id for different entity types is definitely possible at the datastore level, see re-using an entity's ID for other entities of different kinds - sane idea?
The Id uniqueness is only guaranteed across entities of the same kind and with the same parent entity (the unique entity key is based on a combination of these 3 items). Since your subtypes are actually different entity kinds there is no problem having the same Id across these kinds, so subclassing is not the way to achieve what you want.
To have unique Ids you need to have a unique entity kind, say User. To distinguish the different user types maybe have inside User a type property which would be a reference to a entity of UserTypeX kind containing the info specific to that user type?
It sounds like you have found the "correct" solution - create an Email entity that uses the email address as the id and contains a pointer to the appropriate User entity. When creating a new User/Email, always check for pre-existence of the email address in a transaction.
This really isn't any different from using the email address as the id of the User directly except that the extra layer of indirection allows users to change their email addresses, which is generally a good idea. The transactional logic is similar either way.
Transactionally looking up & creating an entity with a natural primary key is pretty much the only way of guaranteeing uniqueness in the datastore. It is effective and scalable.
What are the best practices for modeling inheritance in databases?
What are the trade-offs (e.g. queriability)?
(I'm most interested in SQL Server and .NET, but I also want to understand how other platforms address this issue.)
There are several ways to model inheritance in a database. Which you choose depends on your needs. Here are a few options:
Table-Per-Type (TPT)
Each class has its own table. The base class has all the base class elements in it, and each class which derives from it has its own table, with a primary key which is also a foreign key to the base class table; the derived table's class contains only the different elements.
So for example:
class Person {
public int ID;
public string FirstName;
public string LastName;
}
class Employee : Person {
public DateTime StartDate;
}
Would result in tables like:
table Person
------------
int id (PK)
string firstname
string lastname
table Employee
--------------
int id (PK, FK)
datetime startdate
Table-Per-Hierarchy (TPH)
There is a single table which represents all the inheritance hierarchy, which means several of the columns will probably be sparse. A discriminator column is added which tells the system what type of row this is.
Given the classes above, you end up with this table:
table Person
------------
int id (PK)
int rowtype (0 = "Person", 1 = "Employee")
string firstname
string lastname
datetime startdate
For any rows which are rowtype 0 (Person), the startdate will always be null.
Table-Per-Concrete (TPC)
Each class has its own fully formed table with no references off to any other tables.
Given the classes above, you end up with these tables:
table Person
------------
int id (PK)
string firstname
string lastname
table Employee
--------------
int id (PK)
string firstname
string lastname
datetime startdate
Proper database design is nothing like proper object design.
If you are planning to use the database for anything other than simply serializing your objects (such as reports, querying, multi-application use, business intelligence, etc.) then I do not recommend any kind of a simple mapping from objects to tables.
Many people think of a row in a database table as an entity (I spent many years thinking in those terms), but a row is not an entity. It is a proposition. A database relation (i.e., table) represents some statement of fact about the world. The presence of the row indicates the fact is true (and conversely, its absence indicates the fact is false).
With this understanding, you can see that a single type in an object-oriented program may be stored across a dozen different relations. And a variety of types (united by inheritance, association, aggregation, or completely unaffiliated) may be partially stored in a single relation.
It is best to ask yourself, what facts do you want to store, what questions are you going to want answers to, what reports do you want to generate.
Once the proper DB design is created, then it is a simple matter to create queries/views that allow you to serialize your objects to those relations.
Example:
In a hotel booking system, you may need to store the fact that Jane Doe has a reservation for a room at the Seaview Inn for April 10-12. Is that an attribute of the customer entity? Is it an attribute of the hotel entity? Is it a reservation entity with properties that include customer and hotel? It could be any or all of those things in an object oriented system. In a database, it is none of those things. It is simply a bare fact.
To see the difference, consider the following two queries. (1) How many hotel reservations does Jane Doe have for next year? (2) How many rooms are booked for April 10 at the Seaview Inn?
In an object-oriented system, query (1) is an attribute of the customer entity, and query (2) is an attribute of the hotel entity. Those are the objects that would expose those properties in their APIs. (Though, obviously the internal mechanisms by which those values are obtained may involve references to other objects.)
In a relational database system, both queries would examine the reservation relation to get their numbers, and conceptually there is no need to bother with any other "entity".
Thus, it is by attempting to store facts about the world—rather than attempting to store entities with attributes—that a proper relational database is constructed. And once it is properly designed, then useful queries that were undreamt of during the design phase can be easily constructed, since all the facts needed to fulfill those queries are in their proper places.
TPT, TPH and TPC patterns are the ways you go, as mentioned by Brad Wilson. But couple of notes:
child classes inheriting from a base class can be seen as weak-entities to the base class definition in the database, meaning they are dependent to their base-class and cannot exist without it. I've seen number of times, that unique IDs are stored for each and every child table while also keeping the FK to the parent table. One FK is just enough and its even better to have on-delete cascade enable for the FK-relation between the child and base tables.
In TPT, by only seeing the base table records, you're not able to find which child class the record is representing. This is sometimes needed, when you want to load a list of all records (without doing select on each and every child table). One way to handle this, is to have one column representing the type of the child class (similar to the rowType field in the TPH), so mixing the TPT and TPH somehow.
Say we want to design a database that holds the following shape class diagram:
public class Shape {
int id;
Color color;
Thickness thickness;
//other fields
}
public class Rectangle : Shape {
Point topLeft;
Point bottomRight;
}
public class Circle : Shape {
Point center;
int radius;
}
The database design for the above classes can be like this:
table Shape
-----------
int id; (PK)
int color;
int thichkness;
int rowType; (0 = Rectangle, 1 = Circle, 2 = ...)
table Rectangle
----------
int ShapeID; (FK on delete cascade)
int topLeftX;
int topLeftY;
int bottomRightX;
int bottomRightY;
table Circle
----------
int ShapeID; (FK on delete cascade)
int centerX;
int center;
int radius;
Short answer: you don't.
If you need to serialize your objects, use an ORM, or even better something like activerecord or prevaylence.
If you need to store data, store it in a relational manner (being careful about what you are storing, and paying attention to what Jeffrey L Whitledge just said), not one affected by your object design.
There are two main types of inheritance you can setup in a DB, table per entity and table per Hierarchy.
Table per entity is where you have a base entity table that has shared properties of all child classes. You then have per child class another table each with only properties applicable to that class. They are linked 1:1 by their PK's
Table per hierarchy is where all classes shared a table, and optional properties are nullable. Their is also a discriminator field which is a number that denotes the type that the record currently holds
SessionTypeID is discriminator
Target per hierarchy is faster to query for as you do not need joins(only the discriminator value), whereas target per entity you need to do complex joins in order to detect what type something is as well as retreiuve all its data..
Edit: The images I show here are screen shots of a project I am working on. The Asset image is not complete, hence the emptyness of it, but it was mainly to show how its setup, not what to put inside your tables. That is up to you ;). The session table holds Virtual collaboration session information, and can be of several types of sessions depending on what type of collaboration is involved.
You would normalize of your database and that would actually mirror your inheritance.
It might have performance degradance, but that's how it is with normalizing. You probably will have to use good common sense to find the balance.
repeat of similar thread answer
in O-R mapping, inheritance maps to a parent table where the parent and child tables use the same identifier
for example
create table Object (
Id int NOT NULL --primary key, auto-increment
Name varchar(32)
)
create table SubObject (
Id int NOT NULL --primary key and also foreign key to Object
Description varchar(32)
)
SubObject has a foreign-key relationship to Object. when you create a SubObject row, you must first create an Object row and use the Id in both rows
EDIT: if you're looking to model behavior also, you would need a Type table that listed the inheritance relationships between tables, and specified the assembly and class name that implemented each table's behavior
seems like overkill, but that all depends on what you want to use it for!
Using SQL ALchemy (Python ORM), you can do two types of inheritance.
The one I've had experience is using a singe-table, and having a discriminant column. For instances, a Sheep database (no joke!) stored all Sheep in the one table, and Rams and Ewes were handled using a gender column in that table.
Thus, you can query for all Sheep, and get all Sheep. Or you can query by Ram only, and it will only get Rams. You can also do things like have a relation that can only be a Ram (ie, the Sire of a Sheep), and so on.
Note that some database engines already provides inheritance mechanisms natively like Postgres. Look at the documentation.
For an example, you would query the Person/Employee system described in a response above like this:
/* This shows the first name of all persons or employees */
SELECT firstname FROM Person ;
/* This shows the start date of all employees only */
SELECT startdate FROM Employee ;
In that is your database's choice, you don't need to be particularly smart !
I have a model class that references another model class and seem to be encountering an issue where the #OneToOne annotation fixes one problem but causes another. Removing it causes the inverse.
JPA throws "multiple assignments to same column" when trying to save changes to model. The generated SQL has duplicate columns and I'm not sure why.
Here's a preview of what the classes look like:
The parent class references look like this:
public class Appliance {
public Integer locationId;
#Valid
#OneToOne
public Location location;
}
The child Location class has an id field and a few other text fields -- very simple:
public class Location {
public Integer id;
public String name;
}
When I attempt to perform a save operation, does anyone know why JPA is creating an insert statement for the Appliance table that contains two fields named "location_id"?
I need to annotate the reference to the child class with #OneToOne if I want to be able to retrieve data from the corresponding database table to display on screen. However, If I remove #OneToOne, the save works fine, but it obviously won't load the Location data into the child object when I query the db.
Thanks in advance!
It appears you did not define an #InheritanceType on the parent Class. Since you did not, the default is to combine the the parent and the child class into the same Table in the Single Table Strategy.
Since both entities are going into the same table, I think that #OneToOne is trying to write the id twice - regardless of which side it is on.
If you want the parent to be persisted in its own table, look at InheritanceType.JOINED.
Or consider re-factoring so that you are not persisting the parent separately as JOINED is not considered a safe option with some JPA providers.
See official Oracle Documentation below.
http://docs.oracle.com/javaee/7/tutorial/doc/persistence-intro002.htm#BNBQR
37.2.4.1 The Single Table per Class Hierarchy Strategy
With this strategy, which corresponds to the default InheritanceType.SINGLE_TABLE, all classes in the hierarchy are mapped to a single table in the database. This table has a discriminator column containing a value that identifies the subclass to which the instance represented by the row belongs.
In OpenJPA, according to the docs (http://openjpa.apache.org/builds/1.0.1/apache-openjpa-1.0.1/docs/manual/jpa_overview_mapping_field.html), section 8.4, the foreign key column in a one-to-one mapping:
Defaults to the relation field name, plus an underscore, plus the name
of the referenced primary key column.
And the JPA API seems to concur with this (http://docs.oracle.com/javaee/6/api/javax/persistence/JoinColumn.html)
I believe this means that in a one-to-one mapping, the default column name for properties in a dependent class is parentClassFieldName_dependentClassFieldName (or location_id in your case). If that's the case, the location_id column you are defining in your Appliance class is conflicting with the location_id default column name which would be generated for your Location class.
You should be able to correct this by using the #Column(name="someColumnName") annotation and the #JoinColumn annotation on your #OneToOne relationship to force the column name to be something unique.
Ok gang, I figured it out.
Here's what the new code looks like, followed by a brief explanation...
Parent Class:
public class Appliance {
public Integer locationId;
#Valid
#OneToOne(cascade = CascadeType.ALL)
#JoinColumn(name="location_id", referencedColumnName="id")
public Location location;
}
Child Class:
public class Location {
public Integer id;
public String name;
}
The first part of the puzzle was the explicit addition of "cascade = CascadeType.ALL" in the parent class. This resolved the initial "multiple assignments to same column" by allowing the child object to be persisted.
However, I encountered an issue during update operations which is due to some sort of conflict between EBean and JPA whereby it triggers a save() operation on nested child objects rather than a cascading update() operation. I got around this by issuing an explicit update on the child object and then setting it to null before the parent update operation occurred. It's sort of a hack, but it seems like all these persistence frameworks solve one set of problems but cause others -- I guess that's why I've been old school and always rolled my own persistence code until now.