Imagine that I have a simple entity as follows:
#Entity
#Table(name = "PERSON")
public class Person {
#Id
#Column(name = "NAME")
private String name;
#Column(name = "GENDER")
private String gender;
}
And two tables, the actual table holding the information and a lookup table.
TABLE PERSON (
NAME VARCHAR2 NOT NULL,
GENDER INT NOT NULL);
TABLE GENDER_LOOKUP (
GENDER_ID INT NOT NULL,
GENDER_NAME VARCHAR2 NOTNULL);
I want to save the information from my entity into the table, so that the String field gender is automatically converted to the corresponding gender int, using the lookup table as a reference. I thought of two approaches, but I was wondering if there was a more efficient way.
Create an enum and use ordinal enum to persist. I would rather avoid this because I'd like to have only one "source of truth" for the information and for various business reasons, it has to be a lookup table.
Use the #Converter annotation and write a custom converter. I think that this would require me to query the table to pull out the relevant row, so it would mean that I would have to make a JPA call to the database every time something was converted.
I'm currently planning to use 2, but I was wondering if there was any way to do it within the database itself, since I assume using JPA to do all of these operations has a higher cost than if I did everything in the database. Essentially attempt to persist a String gender, and then the database would look at the lookup table and translate it to the correct Id and save it.
I'm specifically using openJpa but hopefully this isn't implementation specific.
Since you seriously considered using enum, it means that GENDER_LOOKUP is static, i.e. the content doesn't change while the program is running.
Because of that, you should use option 2, but have the converter cache/load all the records from GENDER_LOOKUP on the first lookup. That way, you still only have one "source of truth", without the cost of hitting the database on every lookup.
If you need to add a new gender1, you'll just have to restart the app to refresh the cache.
1) These days, who know what new genders will be needed.
Let's imagine I have these entity classes (I omitted the JPA annotations):
class TableA { Long id; List<TableB> tableBs; }
class TableB { Long id; List<TableC> tableCs; }
class TableC { Long id; List<TableD> tableDs; }
class TableD { Long id; int foo; }
This gives us this entity "graph"/"dependencies":
TableA ---OneToMany--> TableB ---OneToMany--> TableC ---OneToMany--> TableD
If I want to deeply load all sub-entities, sub-sub-entities and sub-sub-sub-entities of one TableA object, JPA will produce these queries:
1 query to get one TableA: it's fine, of course
1 query to lazy-load tableA.getTableBs(): it's fine too => we get n TableB entities
n queries to lazy-load all tableA.getTableBs()[1..n].getTableCs() => we get m TableC entities per TableB entity
n*m queries to lazy-load all tableA.getTableBs()[1..n].getTableCs()[1..m].getTableDs()
I'd like to avoid this 1+n*(m+1) queries to lazy-load all sub-sub-sub-entities of my TableA object.
If I had to do the queries by hand, I'd just need 4 queries:
SAME: 1 query to get one TableA
SAME: 1 query to lazy-load tableA.getTableBs(): it's fine
BETTER: 1 query to get all TableC WHERE id IN (tableA.getTableBs()[1..n].getId()) // The "IN" clause is computed in Java, I do one SQL query, and then from the TableC{id,parentTableBId,...} result, I populate each TableB.getTableC() list with Java
WAY BETTER: 1 query to get all TableD WHERE id IN (tableA.getTableBs()[1..n].getTableCs()[1..m].getId()) // Same IN-clause-computing and tree-traversal to assign all TableD childs to each TableC parents
I'd like to call either:
JpaMagicUtils.deeplyLoad(tableA); // and it does the "IN" clauses building (possibly splitting into 2 or 3 queries to have have too many "IN" ids) + tree children assignation itself, or
JpaMagicUtils.deeplyLoad(tableA, "getTablesBs().getTableCs()"); JpaMagicUtils.deeplyLoad(tableA, "getTableBs().getTableCs().getTableDs()"); // to populate one level at a time, and have a better granularity of which fields to load in bulk, and which fields not to load.
I don't think there is a way with JPA for that.
Or is there a way with JPA for that?
Or as a non-JPA-standard way, but perhaps with Hibernate? or with another JPA implementation?
Or is there a library that we can use just to do that (on top of any JPA implementation, or one implementation in particular)?
If I had to do the queries by hand, I'd just need 4 queries
Well, why not one query joining four tables?
Seriously though, if you want to limit the number of queries, I'd first try Hibernate's #Fetch(FetchMode.JOIN) (not sure if there are similar annotations for other JPA providers). It is a hint that tells Hibernate to use a join to load child entities (rather than issuing a separate query). It does not always work for nested one-to-many associations, but I'd try defining it at the deepest level of the hierarchy and then working your way up until you find the performance acceptable (or the size of the result set forbidding).
If you're looking for generic solution, then sadly I do not know any JPA provider that would follow the algorithm you describe, neither in general nor as an opt-in feature. This is a very specific use case, and I guess the price of being robust as a library is not incorporating optimizations for special-case scenarios.
Note: if you want to eagerly load an entity hierarchy in one use case but keep the entities lazily-loaded in general scenarios, you'll want to look up JPA entity graphs. You may also need to write a custom query using FETCH JOIN, but I don't think nested FETCH JOINS are generally supported.
Consider following simple entity model
class Order{
int id;
String description;
//one to one eager load with join column specified
Detail details;
//one to many lazy load with mapped by specified
Collection<Item> items;
}
class Detail{
}
class Item{
String name;
//reference to order
}
Now, let's say the requirement is to load all the orders with item details by some criteria (e.g. description matching something). Simple, i write a hql like "from Order where description...". This loads 1000 entities for example and item collection is lazy loaded. I force load them within the session by calling size.
This of course led to a N+1 problem so i decided to use batch fetching for items. Just added the batch size annotation on item collection and much fewer queries as expected.
However, i am not interested in 'detail' at all but since it is a one to one eager load, there is one query per Order to load this always. I simply want to do away with these queries.
To solve this, i try to do a select without details but i am not sure how to include items (collection) in the query so that it is loaded exactly in the same way as if i was selecting all (that is, lazy loaded which then can utilize batch size on later calls). Some suggestions are to use join in the where clause but that initializes my collection with empty array list (and not with PersistentBag as is the case with Lazy loading).
Looking for solutions.
One possible solution is the following:
Create a POJO which will contain a query result. Example:
public class OrderResult {
private String description;
private String itemName;
// ... more fields, if any
public OrderResult(String desc, String itemName) {
this.description = desc;
this.itemName = itemName;
}
// getters & setters
}
Create a JPQL query using a constructor expression as:
List<OrderResult> resultList = entityManager.createQuery("SELECT NEW OrderResult(o.description, i.name) FROM Order o JOIN o.items i where <condition>", OrderResult.class).getResultList();
So you'll get a list of instances of OrderResult containing only the information you're interested in.
NOTE 1: You're talking of HQL, but HQL is the Hibernate specific legacy query language. As Hibernate is an implementation of JPA, and you tagged your question with JPA, this solution should work in your environment too.
NOTE 2: In the solution, I am using the so called constructor expression of JPQL which is defined using NEW in the select clause. The argument to the NEW operator must be a fully qualified class name,e.g., if you put the OrderResult class in a package com.mycompany.myproject.order, then the expression should look like:
SELECT NEW com.mycompany.myproject.order.OrderResult(...) FROM ...
NOTE 3: This is just to give you a hint how to implement the solution and should be considered as pseodo code.
Please don't ask me why I need to do this, as even if I think I could find another way to solve my specific problem, I want to understand HQL and its limits more.
Given the following entities
#Entity
class Child {
private String someAttribute1;
.....
private String someAttributeN;
#ManyToOne(EAGER)
private Parent parent;
}
class Parent {
private String someParent1;
....
private String someParentN;
}
If I select Child then Hibernate automatically fetches all columns from Child and Parent in a single joined SQL, and that is the typical desired case.
Sometimes I know that, for entities mapped with a large number of columns, I need only a subset.
If I select item.someAttribute1 as someAttribute1, item.someAttribute2 as someAttribute2, item.someAttribute3 as someAttribute3 from Child item etc. tied to a ResultTransformer I can let Hibernate return me only 3 columns from the SQL, or more columns if I list them. OK, that is cool and works like a charm.
However if I need to fetch only, say, 3 columns from Child and 2 from Parent, while the rest can be null, and materialize a Child entity with its relationship, I cannot write the following
select item.someAttribute1 as someAttribute1, item.someAttribute2 as someAttribute2, item.someAttribute3 as someAttribute3, item.parent.someParent1 as parent.someParent1, item.parent.someParent2 as parent.someParent2 from Child item left join item.parent
The above does not work because Hibernate does not allow an alias to be composed. It disallows me to use as parent.someName clause because aliases should probably be flat.
Just to tell a counter example, in languages such as LINQ the problem does not apply
from Child c in children
select new Child {
SomeAttribute1 = c.someAttribute1,
SomeAttribute2 = c.someAttribute2,
Parent = new Parent {
Attribute1 = c.Parent.Attribute1,
.......
}
}
With the above statement, Entity Framework will only fetch the desired columns.
I don't want to make comparison or criticism between Hibernate for Java and Entity Framework for C#, absolutely.
I only have the need to fetch a subset of the columns that compose an entity with a #ManyToOne relationship, in order to optimize memory and bandwidth usage. Some columns from the child entity and some from the parent.
I just want to know if and how is it possible in Hibernate to achieve something like that. To populate parent attribute in the result set with an object of class Parent that is populated with only a subset of columns (the rest being null is no problem). I am using ResultTransformers happily
There are two problems with it.
Hibernate doesn't allow to use nested aliases like as parent.someName in HQL. It produces a parsing error. But you can use nested aliases with Criteria using Projections.property("parent.someName").
Hibernate doesn't have a result transformer to populate result objects using nested aliases.
You can use Criteria requests with a custom result transformer as described here
How to transform a flat result set using Hibernate
What are the best practices for modeling inheritance in databases?
What are the trade-offs (e.g. queriability)?
(I'm most interested in SQL Server and .NET, but I also want to understand how other platforms address this issue.)
There are several ways to model inheritance in a database. Which you choose depends on your needs. Here are a few options:
Table-Per-Type (TPT)
Each class has its own table. The base class has all the base class elements in it, and each class which derives from it has its own table, with a primary key which is also a foreign key to the base class table; the derived table's class contains only the different elements.
So for example:
class Person {
public int ID;
public string FirstName;
public string LastName;
}
class Employee : Person {
public DateTime StartDate;
}
Would result in tables like:
table Person
------------
int id (PK)
string firstname
string lastname
table Employee
--------------
int id (PK, FK)
datetime startdate
Table-Per-Hierarchy (TPH)
There is a single table which represents all the inheritance hierarchy, which means several of the columns will probably be sparse. A discriminator column is added which tells the system what type of row this is.
Given the classes above, you end up with this table:
table Person
------------
int id (PK)
int rowtype (0 = "Person", 1 = "Employee")
string firstname
string lastname
datetime startdate
For any rows which are rowtype 0 (Person), the startdate will always be null.
Table-Per-Concrete (TPC)
Each class has its own fully formed table with no references off to any other tables.
Given the classes above, you end up with these tables:
table Person
------------
int id (PK)
string firstname
string lastname
table Employee
--------------
int id (PK)
string firstname
string lastname
datetime startdate
Proper database design is nothing like proper object design.
If you are planning to use the database for anything other than simply serializing your objects (such as reports, querying, multi-application use, business intelligence, etc.) then I do not recommend any kind of a simple mapping from objects to tables.
Many people think of a row in a database table as an entity (I spent many years thinking in those terms), but a row is not an entity. It is a proposition. A database relation (i.e., table) represents some statement of fact about the world. The presence of the row indicates the fact is true (and conversely, its absence indicates the fact is false).
With this understanding, you can see that a single type in an object-oriented program may be stored across a dozen different relations. And a variety of types (united by inheritance, association, aggregation, or completely unaffiliated) may be partially stored in a single relation.
It is best to ask yourself, what facts do you want to store, what questions are you going to want answers to, what reports do you want to generate.
Once the proper DB design is created, then it is a simple matter to create queries/views that allow you to serialize your objects to those relations.
Example:
In a hotel booking system, you may need to store the fact that Jane Doe has a reservation for a room at the Seaview Inn for April 10-12. Is that an attribute of the customer entity? Is it an attribute of the hotel entity? Is it a reservation entity with properties that include customer and hotel? It could be any or all of those things in an object oriented system. In a database, it is none of those things. It is simply a bare fact.
To see the difference, consider the following two queries. (1) How many hotel reservations does Jane Doe have for next year? (2) How many rooms are booked for April 10 at the Seaview Inn?
In an object-oriented system, query (1) is an attribute of the customer entity, and query (2) is an attribute of the hotel entity. Those are the objects that would expose those properties in their APIs. (Though, obviously the internal mechanisms by which those values are obtained may involve references to other objects.)
In a relational database system, both queries would examine the reservation relation to get their numbers, and conceptually there is no need to bother with any other "entity".
Thus, it is by attempting to store facts about the world—rather than attempting to store entities with attributes—that a proper relational database is constructed. And once it is properly designed, then useful queries that were undreamt of during the design phase can be easily constructed, since all the facts needed to fulfill those queries are in their proper places.
TPT, TPH and TPC patterns are the ways you go, as mentioned by Brad Wilson. But couple of notes:
child classes inheriting from a base class can be seen as weak-entities to the base class definition in the database, meaning they are dependent to their base-class and cannot exist without it. I've seen number of times, that unique IDs are stored for each and every child table while also keeping the FK to the parent table. One FK is just enough and its even better to have on-delete cascade enable for the FK-relation between the child and base tables.
In TPT, by only seeing the base table records, you're not able to find which child class the record is representing. This is sometimes needed, when you want to load a list of all records (without doing select on each and every child table). One way to handle this, is to have one column representing the type of the child class (similar to the rowType field in the TPH), so mixing the TPT and TPH somehow.
Say we want to design a database that holds the following shape class diagram:
public class Shape {
int id;
Color color;
Thickness thickness;
//other fields
}
public class Rectangle : Shape {
Point topLeft;
Point bottomRight;
}
public class Circle : Shape {
Point center;
int radius;
}
The database design for the above classes can be like this:
table Shape
-----------
int id; (PK)
int color;
int thichkness;
int rowType; (0 = Rectangle, 1 = Circle, 2 = ...)
table Rectangle
----------
int ShapeID; (FK on delete cascade)
int topLeftX;
int topLeftY;
int bottomRightX;
int bottomRightY;
table Circle
----------
int ShapeID; (FK on delete cascade)
int centerX;
int center;
int radius;
Short answer: you don't.
If you need to serialize your objects, use an ORM, or even better something like activerecord or prevaylence.
If you need to store data, store it in a relational manner (being careful about what you are storing, and paying attention to what Jeffrey L Whitledge just said), not one affected by your object design.
There are two main types of inheritance you can setup in a DB, table per entity and table per Hierarchy.
Table per entity is where you have a base entity table that has shared properties of all child classes. You then have per child class another table each with only properties applicable to that class. They are linked 1:1 by their PK's
Table per hierarchy is where all classes shared a table, and optional properties are nullable. Their is also a discriminator field which is a number that denotes the type that the record currently holds
SessionTypeID is discriminator
Target per hierarchy is faster to query for as you do not need joins(only the discriminator value), whereas target per entity you need to do complex joins in order to detect what type something is as well as retreiuve all its data..
Edit: The images I show here are screen shots of a project I am working on. The Asset image is not complete, hence the emptyness of it, but it was mainly to show how its setup, not what to put inside your tables. That is up to you ;). The session table holds Virtual collaboration session information, and can be of several types of sessions depending on what type of collaboration is involved.
You would normalize of your database and that would actually mirror your inheritance.
It might have performance degradance, but that's how it is with normalizing. You probably will have to use good common sense to find the balance.
repeat of similar thread answer
in O-R mapping, inheritance maps to a parent table where the parent and child tables use the same identifier
for example
create table Object (
Id int NOT NULL --primary key, auto-increment
Name varchar(32)
)
create table SubObject (
Id int NOT NULL --primary key and also foreign key to Object
Description varchar(32)
)
SubObject has a foreign-key relationship to Object. when you create a SubObject row, you must first create an Object row and use the Id in both rows
EDIT: if you're looking to model behavior also, you would need a Type table that listed the inheritance relationships between tables, and specified the assembly and class name that implemented each table's behavior
seems like overkill, but that all depends on what you want to use it for!
Using SQL ALchemy (Python ORM), you can do two types of inheritance.
The one I've had experience is using a singe-table, and having a discriminant column. For instances, a Sheep database (no joke!) stored all Sheep in the one table, and Rams and Ewes were handled using a gender column in that table.
Thus, you can query for all Sheep, and get all Sheep. Or you can query by Ram only, and it will only get Rams. You can also do things like have a relation that can only be a Ram (ie, the Sire of a Sheep), and so on.
Note that some database engines already provides inheritance mechanisms natively like Postgres. Look at the documentation.
For an example, you would query the Person/Employee system described in a response above like this:
/* This shows the first name of all persons or employees */
SELECT firstname FROM Person ;
/* This shows the start date of all employees only */
SELECT startdate FROM Employee ;
In that is your database's choice, you don't need to be particularly smart !