ejb3: mapping many-to-many relationship jointable with a simple primary key

ejb3: mapping many-to-many relationship jointable with a simple primary key - java

often in books, i see that when a many-to-many relationship is translated to a DB schema, the JoinTable gets a compound key consisting of the primary keys of the tables involved in many-to-many relationship.
I try to avoid compound keys completely. So i usually create a surrogate key even for the JoinTable and allow the database to fill it up by trigger or whatever feature a database has for incrementing primary keys. It just seems like a much simpler approach.
the only issue i can think is that, there are chances of duplication of foreign key pair in the JoinTable. But this can be avoided by a simple query before a row is inserted in the JoinTable.
Since books always use the compound keys approach, i wanted to know if, there are any negative effects if i use simple one column surrogate keys for the JoinTable?

In my opinion, using a single primary key is a bad idea.
First, as you said, a single primary key won't ensure unicity in the database. Sure, you can check this at runtime using a simple query, but this is costly, and if you forget just one time to do your check query, you can put your database in an inconsistent state.
Also, I think that using an additional column for the primary key is, in this case, not necessary. You don't need to identify relationships by a unique primary key, since the relationship is already defined by two unique keys: the primary keys of your two tables. In this case, you'll have unnecessary data, that will add complexity to your data model, and that you'll probably never use.

I try to avoid compound keys completely.
Huh? Why? I mean, why totally avoiding them? What's the reason behind this?
So i usually create a surrogate key even for the JoinTable and allow the database to fill it up by trigger or whatever feature a database has for incrementing primary keys. It just seems like a much simpler approach.
No offense but we don't have the same definition of simplicity then. I really don't see where it is simpler.
The only issue i can think is that, there are chances of duplication of foreign key pair in the JoinTable. But this can be avoided by a simple query before a row is inserted in the JoinTable.
First of all, don't use a SELECT to check uniqueness (you can have a race condition, a SELECT without locking the whole table won't guarantee anything), use a UNIQUE constraint, that's what UNIQUE is for.
And lets imagine one second that a SELECT would have been possible, do you really find that simpler? In general, people try to avoid hitting the database if possible. They also avoid having to do extra work.
Since books always use the compound keys approach, i wanted to know if, there are any negative effects if i use simple one column surrogate keys for the JoinTable?
So you mean something like this:
A A_B B
------- ------------------ --------
ID (PK) ID (PK), ID (PK)
A_ID (FK),
B_ID (FK),
UNIQUE(A_ID, B_ID)
Sure, you could do that (and you could even map it with JPA if you use some kind of trigger or identity column for the ID). But I don't see the point:
The above design is just not the standard way to map a (m:n) relation, it's not what people are used to find.
A_B is not really an Entity by itself (which is what the model somehow suggests, see #1).
The couple (A_ID, B_ID) is a natural candidate for the key, why not using it (and wasting space)?
The above design is not simpler, it does introduce more complexity.
To sum up, I don't see any advantage.

Related

How to save additional attributes when using envers

I'm using Hibernate Envers for my revision history.
This is my table setup:
CREATE TABLE EPIC (
epicid SERIAL NOT NULL,
accountid BIGINT NOT NULL,
description TEXT NOT NULL UNIQUE,
epicowner TEXT NOT NULL,
PRIMARY KEY(epicid)
);
CREATE TABLE EPIC_AUD(
epicid BIGINT NOT NULL ,
REV BIGINT NOT NULL,
accountid BIGINT,
description TEXT,
epicowner TEXT,
REVTYPE BIGINT,
PRIMARY KEY(epicid,REV)
);
Currently when i make changes it only saves the composite primary key values and the revision type. Since i also want to log the user who deleted some entity, i want to save that value too. This is the code i'm using for deleting the entity.
#Override
public boolean deleteItem(Epic epicFromFrontend) {
transactionBegin();
Epic epicToRemove = getEntityManager().find(Epic.class, epicFromFrontend.getEpicid());
epicToRemove.setAccountid(epicFromFrontend.getAccountid());
getEntityManager().remove(epicToRemove);
return transactionEnd();
}
Actually i have 2 questions:
How to save the accountid too
Or is it maybe smarter and better to save ALL data. so i have no empty fields in my EPIC_AUD table after a delete.

It is a common practice to capture various additional pieces of information that is audit-specific during the insert, update, or delete of your domain entities.
A simple yet intrusive way is to store that state in the same structure as the entity, as suggested by Marcin H. While this approach may work, there are several problems with this approach.
Mixing Concerns
The problem here is that historical related information is now being stored right along side the domain specific data. Much like security, auditing is a cross cutting concern and thus should be treated in the same way when it comes to data structures. Additionally, as multiple audited rows in your schema are manipulated, you often represent the same user, timestamp, etc across multiple tables which lead to unnecessary schema and table bloat.
Unnecessary fields / operations for data removal
When you store fields of this calibur on the entity itself, it introduces an interesting set of requirements as a part of the entity removal process. If you want Envers to track that removal user then you either have to perform an entity update with that user prior to removal or introduce an additional column to track whether a row is soft deleted, as suggested by Marcin H.This approach means that a table will always grow indefinitely, even when delete has been removed. It could have negative impacts to long-term query performance and other various concerns. Ideally, if data is no longer relevant except from a historical purpose and no FK relationships continue to exist that must be maintained, its far better to remove the row from the non-audit table.
Rather than the above, I suggest using this strategy I posted here that describes how to leverage a custom RevisionEntity data structure with Envers, allowing you to track multiple columns of data that is pertinent to the current transaction operation.
This approach has the following added benefits:
No Envers (audit) specific code littered across your DAO methods. Your DAO methods continue to focus on the domain specific operation only, as it should be.
In situations where multiple entities are manipulated during a single transaction, you now only capture the various audit-attributes once per transaction (aka once per revision). This means if the user adds, removed, and updates various rows, they'll all be tagged once.
You now can easily track the person who performed the row deletion because the audit attributes are kept on the RevisionEntity, which will be generated for the deletion. No special operations or fields are needed to handle this case. Furthermore, you can enable storing the entity snapshot at deletion and then have access to (1) who deleted the row and (2) what the row looked like prior to the removal too.

You can add attribue record_active boolean to your table epic, as well as to table epic_aud of course.
When record_active is false it means record has been "deleted".
And never remove any record physically - it's good practice in fact :)

JPA Composite Key for one Table and a Primary Key for another Table - Possible?

Is it possible to have both a composite key and a primary key in the same Domain Model (Entity Class) so that some tables (queries) are joined using the composite key and other tables (queries) are joined using the primary key?
I'm dealing with legacy applications and I have limited access to changing the underlying database. Some of our queries are expecting a single row result but are getting many rows because of flaws in our database design. We can fix this problem by introducing a composite key to one of our Domain Models but doing so will affect many (many) other components that rely on the original primary key.
From my understand of JPA and the reading I've done so far on this matter I do not think this is possible but I thought it would be worth a shot to reach out to others who may have had a similar problem.

The table has only one primary key, so you have no options to choose which primary key to use. Also, i can't understand why you going to have differences between database original model and JPA. Actually, getting single row instead of many rows is where clause's task.
You said some of your queries fails after adding composite pk, so may be you just made your composite pk in wrong way?
Anyway, here is nice example or implementation composite pk, may be it will help you:
Mapping ManyToMany with composite Primary key and Annotation:

Maybe you should give a different look at your problem.
If your queries are returning multiple and different rows, then you should be able to resolve this using a more specific WHERE clause;
If your queries are returning multiple and equal rows, you should try the DISTINCT clause inside your query, example:
SELECT DISTINCT e FROM br.com.stackoverflow.Entity e

Guaranteed FIFO using JPA (Hibernate implementation) with MySQL

I need to persist a queue of tasks in MySQL. When reading them from DB I have to make sure the order is exactly the same as they have been persisted.
In general I prefer to have the solution DB agnostic (i.e. pure JPA) but adding some flavor of Hibernate and/or MySQL is acceptable as well.
My (probably naive) first version looks like:
em.createNamedQuery("MyQuery", MyTask.class).setFirstResult(0).setMaxResults(count).getResultList();
Where MyQuery doesn't have any "order by" clause i.e. it looks like:
SELECT t FROM MyTasks
Would such approach guarantee that the incoming results/entities are ordered in the way they have been persisted? What if I enable caching as well?
I was also thinking of adding an extra field to the task entity which is a timestamp in milliseconds (UTC from 1970-01-01) and then order by it in the query but then I might be in a situation where two tasks get generated immediately one after the other and they have the same timestamp.
Any solutions/ideas are welcome!
EDIT:
I just realised that auto increment (at least in MySQL) would throw an exception once it reaches its max value and no more inserts would be possible. This means I shouldn't worry about having the counter reset by the DB and I could explicitly order by an "auto increment" column in my query. Of course I would have another problem to deal with i.e. what to do in case the volume is so high that the largest possible unsigned integer type in MySQL is not big enough but this problem is not nesessarily coupled with the problem I am dealing right now.

Focusing in a pure JPA solution, cause the entity MyTasks must have a primary key I suggest you to use Sequence Generator for its primary key and sort the result of your query using order by clause on the key.
For example:
#Entity
class MyTask {
#Id #GeneratedValue(strategy=GenerationType.SEQUENCE)
private Long id;
You can also tight it a little bit with your database using #SequenceGenerator to specify a generator defined in the database.
Edit: Did you take a look at the #PrePersist option for setting the timestamp? Maybe you can combine the timestamp field and the id sequenced generation and order by in this order, so timestamp conflicts are resolved by id comparation (which are unique).

Most RDBMS's will store in the order of insertion and given no other instruction will order results that way too. If you don't want to leave it to chance, you have a couple of options.
1) You can generate a reasonably unique ID by using a timestamp and a incrementing fixed-length number,
OR
2) You can just define your table with an autonumbered primary key (which is probably easier).
If the table has a primary key to order by, then by default, most RDBMS's will return things in ascending primary key order... or you can enforce it explicitly in your query.

JPA (with or without cache) and RDBMS not guarantee of persisting or uploading sequence when you do not use order instruction. To solve task you should add integral primary key to the entity and use it when gather data as it mentioned other answereres.

Concatenating ids with extension to keep multiple rows related to an entity in a single Column Family

I am concatenating two Integer ids through bitwise operations(as described below) to create a single primary key of type long. I wanted to know if this is a good practice. This would help me in keeping multiple rows of an entity in a single column family by appending different extensions to the entityId.
Are there better ways ? My Ids are of type Integer(4 bytes).
public static final long makeCompositeKey(int entityId, int extension){
return (long)entityId << 32 | extension;
}

Most databases come with a built-in way to create IDs automatically, and your app doesn't need to care about it. I'm most familiar with Postgres, where I create a sequence for each table and use that for the #Id column, but I know that Oracle and MSSQL have their own way of accomplishing the same thing.
In general, however, each column in your database should store a single piece of information. Taking two pieces of info and concatenating them together as you're suggesting goes against "proper" database design according to "book learning." By which I mean: you should only do it if you have a very, very good reason for doing so (and even then, you should think two or three times about it before actually doing it.) If you don't have a good reason for doing it, then don't do it.

This is not a good design. It should be two different columns having composite constraint. You can have a another generated Unique ID for as a primary key.
Aside from the primary key, You can also create a composite index on two columns to speed up the queries if most queries will filter based on these two values.

Should I use composite primary keys or not?

There seems to only be 2nd class support for composite database keys in Java's JPA (via EmbeddedId or IdClass annotations). And when I read up on composite keys, regardless of language, people keep coming across as them being a bad thing. But I cannot understand why. Are composite keys still acceptable to use these days? If not, why not?
I've found one person who agrees with me:
http://weblogs.sqlteam.com/jeffs/archive/2007/08/23/composite_primary_keys.aspx
But another who doesn't:
http://weblogs.java.net/blog/bleonard/archive/2006/11/using_composite.html
Is it just me, or are people not able to make the distinction of where a composite key is appropriate or not? I see composite primary keys useful when the table doesn't represent an entity - i.e. when it represents a join table.
A simple example:
Actor { Id, Name, Email }
Movie { Id, Name, Year }
Character { Id, Name }
Role { Actor, Movie, Character }
Here Actor, Movie and Character obviously benefit from having an Id column as the primary key.
But Role is a Many-To-Many join table. I see no point in creating an id just to identify a row in the database. To me it seems obvious that the primary key is { Actor, Movie, Character }. It also seems like a rather limiting feature, especially if the data in the join table changes all the time, you could find yourself with primary key collisions once the primary key sequence wraps around to 0.
So, back to the original question, is it still acceptable practice to use composite primary keys? If not, why not?

In my personal opinion you should avoid composite primary keys due to several reasons:
Future changes: when you design a database you sometimes miss what in the future will become important. A significant example for this is thinking a combination of two or more fields is unique (and thus can become a primary key), whereas in the future you want to allow NULLs or other non-unique values in them. Having a single primary key is a good solid solution against such changes.
Uniformity: If every table has a unique numerical ID, and you also maintain some standard as to its name (e.g. "ID" or "tablename_id"), the code and SQL referring to it is clearer (in my opinion).
There are other reasons, but these are just a few.
The main question I would ask is why not use a separate primary key if you have a unique set of fields? What's the cost? An additional integer index? That's not too bad.
Hope that helps.

I think there's no problem using a composite key.
To me the database it's a component on its own, that should be treated the same way we treat code : for instance we want clean code, that communicates clearly its intent, that does one thing and does it well, that doesn't add any uneeded level of complexity, etc.
Same thing with the db, if the PK is composite, this is the reality, so the model should be kept clean and clear. A composite PK it's clearer than the mix auto-increment + constraint. When you see an ID column that does nothing you need to ask what's the real PK, are there any other hidden things that you should be aware of, etc. A clear PK doesn't leave any doubts.
The db is the base of your app, to me we need the most solid base that we can have. On this base we'll build the app ( web or not ). So I can't see why we should bend the db model to conform to some specific in one development tool/framework/language. The data is directing the application, not the other way around. What if the ORM changes in the future and becomes obsolete and a better solution appears that imposes another model ? We can't play with the db model to fit this or that framework, the model should stay the same, it should not depend on what tool we're using to access the data ...
If the db model change in the future, it should change because functionality changed. If we would know today how this functionality will change, we'll be modeling this already. ANd any future change will be dealt with when the time comes, we can't predict for instance the impact on existing data, so one extra column doesn't guarantee that it will withold any future change ...
We should design for today's functionality, and keep the db model the simplest possible, this way it will be easy to change/evolve in the future.

Religious wars have been, and still are, going on on this subject.
OO people have this zealous thing about "identity", and will tell you that the only thing that matters is the ability for you to "identify" "real-life objects" inside your programs, and that composite, "real-life" keys will only get you into trouble when trying to achieve that goal.
Data people have this thing about "uniqueness" that is perceived as "zealous" by the OO side, and will tell you that the only thing that matters is that if the business tells you that the combination of (values for) attribute X and attribute Y must be unique, then it is your job to see to it that the database enforces this business rule of uniqueness of the combined X+Y.
How you want your question answered is just a matter of which religion you prefer. My personal religion is the Data one. That religion has proven to be able to survive any hype and trend ever since 1969.

Similar questions have been asked on SO, and there is no consensus ;)
If you develop a web application, you will love single column pk's, as they make your URLs simpler.
For a sequence to wrap you'd need 2 billion records in a single table (32bit), or 10^18 with 64 bit pk's.
Btw, your data model does not allow for movie characters with unknown actors.

My general opinion is... no. don't use composite primary keys.
They will typically complicate ORMs if you use them (ORMs sometimes go so far as to call composite primary keys "legacy behaviour") and generally if you're using multiple keys, one or more of them will tend to be natural rather than technical keys, which for me is the bigger problem: IMHO you should certainly favour technical primary keys.
More on this in Database Development Mistakes Made by AppDevelopers.

It's a religious thing. I use natural keys and shun surrogates. I have no problem with composite keys either in theory or in practice.
Only the most trivial logical model would involve no composite keys. Call me lazy but I see no need to complicate the data model by introducing surrogates into the physical model on implementation. Sure, I'd consider one on a table if performance issues were found but I take the same approach as for denormalization i.e. as a last resort. Habitually using surrogates amounts to premature optimization, IMO.

In Ruby for Rails, when not explicitly specifying otherwise, your Role table would be kind of like you described (if the columns are actually the IDs from the other tables). Still, in the database you might want to ensure unique combinations by defining a unique index on those three columns, if only to help the database optimizing your queries. With that unique index in place and the framework not using any other primary key anyway, there is no need for a an additional numeric primary key in your Role table. Having said that, the unique index could could very be defined as a composite primary key instead.
As for future changes: defining a strict database for your first iteration will prevent unexpected data to be persisted, which will make migrations much easier.
So: I would use composite primary keys.

I would only ever use them in join tables. The only way to absolutely ensure that every record identifier is unique and consistent over time is to use a synthetic key.
Composite keys seem OK in theory, which is why they are tempting to use, but practice has shown that they usually indicate that there is a flaw in your data model. Worse still, in many cases they will fail to guarantee uniqueness, given a large enough data set. And data sets always grow over time, so using them may mean that you have planted a bomb in your application which will only explode when the application has been in production use for a while.
I think that people are underplaying ORMs. Every mainstream programming language has a defacto ORM, and has had for years, because they solve the fundamental incompatibility between OO and relational structures. Trying to write any complex, testable OO software against SQL databases without an ORM is very inefficient, at best.
Good ORMs also provide practices and tooling that make it much easier to create and maintain consistent high-quality database schema, so on average, a team will come out well ahead by working with an ORM. Handcrafting schema is rather like writing C++ ...people can do it, but in the real world it is so hard to maintain quality over time that the average product is not good.

I have almost never seen a case where a composite key was a good idea (exception, joining table consisting of only two surrogate keys). In the first palce you are wasting space in the child tables. You are harming performance in the joins as integer joins are generally much faster. If you have the composite key as a clustered index (talking SQL Server here), then you are causing the database to be less efficient about storing records and less efficient in building other indexes - all of which use the clusterd index.
When the data in the key changes (As it almost inevitably will) then you need to update all related tables as well casuing massive unecessary updates and wasting processing power on a task that is completely uneeded when the database is designed to use surrogaste keys. Primary keys need not only to be unique but to be unchanging. Composite keys often fail the second test.
So you are thinking of using a technique that harms performance, causes poor use of memory and database storage, uses way more space in child records (another waste of resources) and requires painful updating of what may be millions of child records when things change. And which might make it hard to use an ORM? Why would you do that? Because you are too lazy to put a surrogate key on and then define a unique index on the potential composite key? Is there any gain at all to using a composite index? For the lack of 5 minutes of work you are permanently harming your database?

In terms of the domain model, I see nothing wrong with creating a composite primary key when the table doesn't represent an entity - i.e. when it represents a join table (as you mention in your question), other than if it is not montonically increasing, then you will get a certain amount of page splits during insertions.
Some ORM's don't cope well with composite primary keys, so perhaps it is safer to create a surrogate auto-integer for the primary key, and cover the columns with a non-clustered index.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.