Hibernate suitable inheritance strategy

Hibernate suitable inheritance strategy - java

This is the parent class
#MappedSuperclass
#Inheritance(strategy=InheritanceType.SINGLE_TABLE)
public class APostCommon extends Actionable {
private static final long serialVersionUID = 1L;
#Column(name = "TITLE")
private String title;
#Lob
#Column(name="DESCRIPTION")
private String description;
//omitted others for purity
}
This is a child class
#Entity
#Table(name="QUESTION")
public class Question extends APostCommon {
private static final long serialVersionUID = 1L;
#Transient
private List<Answer> answers;
#Column(name="FOLLOW_COUNT")
private Integer followerCount;
#Column(name="ANSWER_COUNT")
private Integer answerCount;
//omitted others for purity
}
Answer only differs from the superclass by including the questionId field
#Entity
#Table(name="ANSWER")
public class Answer extends APostCommon{
private String questionId;
//omitted others for purity
}
Which data model is suitable for me. Should I use InheritanceType.SINGLE_TABLE or TABLE_PER_CLASS. Whats happen when there is millions record in future.

As i was preparing for JPA Certificate, these are my cherry picked notes on both strategies:
Single Table Strategy
The approach may be more wasteful of database tablespace, but it does offer peak performance for both polymorphic queries and write operations.
The SQL that is needed to issue these operations is simple, optimized and does not require joining
Pros in your case: Maybe your design will not be nice and normalized but you will spare yourself a lot of headache regarding performance, and also your queries will be a lot simpler especially if you are expecting to have millions of records for Questions and Answers.
Cons in your case: I am guessing that Question and Answer will have a lot in common, but also, as the time goes and the table grows, a lot of distinct properties / columns. You might end up with one of those bloated tables that hold every possible peace of data and is unmaintainable at some point (not once i had to had approval of an entire department and days of testing to add a single index to one of the columns to a table like that).
Joined Strategy
Mapping a table per entity provides the data reuse that a normalized data schema offers and is the most efficient way to store data, that is shared by multiple subclasses in the hierarchy.
Pros in your case: with time, the number of additional distinct columns between Question and Answer tables is not a problem as your schema is nice and normalized, therefore each of those has a logical place. The design is clean, clear, maintainable and scalable (possibly you might need to add more abstraction to specialized Questions and Answers).
Cons in your case:
with millions of rows for questions and answers, you will need at least one additional join for basic queries, though if you keep it normalized as the columns / features grows, the minimal number of joins will be higher than that.
Conclusion:
Initially i was leaning towards the Single Table, but as i give myself a bit of time, it made realize that would be a 'lazy' decision (put everything in one bag and forget about design).Joined Tables seems more of a mature decision as you need to think about your indexing strategy, into how many tables you should normalize (and which groups of data) and finally coming up with efficient queries, batch statements.
Decision is up to you, but if you go for the joined strategy you will definately increase your skillset as it will be more demanding.

Related

Should a large relationship be modeled in an entity in terms of readability and performance

I have an entity that has a relation to a very large table ( > 100k Rows ). Now I am asking myself if I should express that relation really in my channel entity because would never call getProducts directly to load all products into memory. I would rather go through a product repositiory and query for a very specific subset for that channel.
I have only put it there so it's more readable from a client perspective.
#Entity
#Table(name = "Channel")
public class Channel {
#OneToMany(fetch = FetchType.LAZY)
#JoinColumn(name = "product_id")
private List<Product> products;
public void setProducts(List<Product> products) {
this.products= products;
}
public List<Product> getProducts() {
return this.products;
}
}

Don't put that in a entity definition at all. You see what is often missed is that name #OneToMany is very misleading. As there is no possibility to limit the number of items you are loading. It can practically be used only when Many means Few :). The Hibernate Guru explains it better then me
Of course the join is executed lazily meaning that as long as it is not used it will do no harm. But why tempt faith. Writing query in a repo gives you ability for pagination and you can narrow the results to the one you will truly need.
The other side of the equation is more useful. So if you have on the child entity annotation #ManyToOne leave it there. It is good if you update parent from context of a child.

In short : no.
I think what you are asking may be opinion-based, but here is what I tend to do : remove all "non-navigable" relationships.
If you are using some generator to get JPA entities from a database model, the generator will create only bi-directional relationships. You need to remove all the relationships you will not used through the ORM mechanisms (for readability, maintainability and performance (in case someone would have the idea to remove the "lazy" indication or to call a getter on the collection in a transaction)).
For example, a User will be linked to a Language (many-to-one), but you don't want the (one-to-many) relationship between Language and User, even if at some point you will need to know in a report screen how many user speak which language.

Hibernate associations using too much memory

I have a table "class" which is linked to tables "student" and "teachers".
A "class" is linked to multiple students and teachers via foriegn key relationship.
When I use hibernate associations and fetch large number of entities(tried for 5000) i am seeing that it is taking 4 times more memory than if i just use foreign key place holders.
Is there something wrong in hibernate association?
Can i use any memory profiler to figure out what's using too much memory?
This is how the schema is:
class(id,className)
student(id,studentName,class_id)
teacher(id,teacherName,class_id)
class_id is foreign key..
Case #1 - Hibernate Associations
1)in Class Entity , mapped students and teachers as :
#Entity
#Table(name="class")
public class Class {
private Integer id;
private String className;
private Set<Student> students = new HashSet<Student>();
private Set<Teacher> teachers = new HashSet<Teacher>();
#OneToMany(fetch = FetchType.EAGER, mappedBy = "classRef")
#Cascade({ CascadeType.ALL })
#Fetch(FetchMode.SELECT)
#BatchSize(size=500)
public Set<Student> getStudents() {
return students;
}
2)in students and teachers , mapped class as:
#Entity
#Table(name="student")
public class Student {
private Integer id;
private String studentName;
private Class classRef;
#ManyToOne
#JoinColumn(name = "class_id")
public Class getClassRef() {
return classRef;
}
Query used :
sessionFactory.openSession().createQuery("from Class where id<5000");
This however was taking a Huge amount of memory.
Case #2- Remove associations and fetch seperately
1)No Mapping in class entity
#Entity
#Table(name="class")
public class Class {
private Integer id;
private String className;
2)Only a placeholder for Foreign key in student, teachers
#Entity
#Table(name="student")
public class Student {
private Integer id;
private String studentName;
private Integer class_id;
Queries used :
sessionFactory.openSession().createQuery("from Class where id<5000");
sessionFactory.openSession().createQuery("from Student where class_id = :classId");
sessionFactory.openSession().createQuery("from Teacher where class_id = :classId");
Note - Shown only imp. part of the code. I am measuring memory usage of the fetched entities via JAMM library.
I also tried marking the query as readOnly in case #1 as below, which does not improve memory usage very much ; just a very little. So that's not the solve.
Query query = sessionFactory.openSession().
createQuery("from Class where id<5000");
query.setReadOnly(true);
List<Class> classList = query.list();
sessionFactory.getCurrentSession().close();
Below are the heapdump snapshots sorted by sizes. Looks like the Entity maintained by hibernate is creating the problem..
Snapshot of Heapdump for hibernate associations program
Snapshot of heapdump for fetching using separate entities

You are doing a EAGER fetch with the below annotation. This will in turn fetch all the students without even you accessing the getStudents(). Make it lazy and it will fetch only when needed.
From
#OneToMany(fetch = FetchType.EAGER, mappedBy = "classRef")
To
#OneToMany(fetch = FetchType.LAZY, mappedBy = "classRef")

When Hibernate loads a Class entity containing OneToMany relationships, it replaces the collections with its own custom version of them. In the case of a Set, it uses a PersistentSet. As can be seen on grepcode, this PersistentSet object contains quite a bit of stuff, much of it inherited from AbstractPersistentCollection, to help Hibernate manage and track things, particularly dirty checking.
Among other things, the PersistentSet contains a reference to the session, a boolean to track whether it's initialized, a list of queued operations, a reference to the Class object that owns it, a string describing its role (not sure what exactly that's for, just going by the variable name here), the string uuid of the session factory, and more. The biggest memory hog among the lot is probably the snapshot of the unmodified state of the set, which I would expect to approximately double memory consumption by itself.
There's nothing wrong here, Hibernate is just doing more than you realized, and in more complex ways. It shouldn't be a problem unless you are severely short on memory.
Note, incidentally, that when you save a new Class object that Hibernate previously was unaware of, Hibernate will replace the simple HashSet objects you created with new PersistentSet objects, storing the original HashSet wrapped inside the PersistentSet in its set field. All Set operations will be forwarded to the wrapped HashSet, while also triggering PersistentSet dirty tracking and queuing logic, etc. With that in mind, you should not keep and use any external references to the Set from before saving, and should instead fetch a new reference to Hibernate's PersistentSet instance and use that if you need to make any changes (to the set, not to the students or teachers within it) after the initial save.

Regarding the huge memory consumption you are noticing, one potential reason is Hibernate Session has to maintain the state of each entity it has loaded the form of EntityEntry object i.e., one extra object, EntityEntry, for each loaded entity. This is needed for hibernate automatic dirty checking mechanism during the flush stage to compare the current state of entity with its original state (one that is stored as EntityEntry).
Note that this EntityEntry is different from the object that we get to access in our application code when we call session.load/get/createQuery/createCriteria. This is internal to hibernate and stored in the first level cache.
Quoting form the javadocs for EntityEntry :
We need an entry to tell us all about the current state of an object
with respect to its persistent state Implementation Warning: Hibernate
needs to instantiate a high amount of instances of this class,
therefore we need to take care of its impact on memory consumption.
One option, assuming the intent is only to read and iterate through the data and not perform any changes to those entities, you can consider using StatelessSession instead of Session.
The advantage as quoted from Javadocs for Stateless Session:
A stateless session does not implement a first-level cache nor
interact with any second-level cache, nor does it implement
transactional write-behind or automatic dirty checking
With no automatic dirty checking there is no need for Hibernate to create EntityEntry for each entity of loaded entity as it did in the earlier case with Session. This should reduce pressure on memory utilization.
Said that, it does have its own set of limitations as mentioned in the StatelessSession javadoc documentation.
One limitation that is worth highlighting is, it doesn't lazy loading the collections. If we are using StatelessSession and want to load the associated collections we should either join fetch them using HQL or EAGER fetch using Criteria.
Another one is related to second level cache where it doesn't interact with any second-level cache, if any.
So given that it doesn't have any overhead of first-level cache, you may want to try with Stateless Session and see if that fits your requirement and helps in reducing the memory consumption as well.

Yes, you can use a memory profiler, like visualvm or yourkit, to see what takes so much memory. One way is to get a heap dump and then load it in one of these tools.
However, you also need to make sure that you compare apples to apples. Your queries in case#2 sessionFactory.openSession().createQuery("from Student where class_id = :classId");
sessionFactory.openSession().createQuery("from Teacher where class_id = :classId");
select students and teachers only for one class, while in case #1 you select way more. You need to use <= :classId instead.
In addition, it is a little strange that you need one student and one teacher record per one class. A teacher can teach more than one class and a student can be in more than one class. I do not know what exact problem you're solving but if indeed a student can participate in many classes and a teacher can teach more than one class, you will probably need to design your tables differently.

Try #Fetch(FetchMode.JOIN), This generates only one query instead of multiple select queries. Also review the generated queries. I prefer using Criteria over HQL(just a thought).
For profiling, use freewares like visualvm or jconsole. yourkit is good for advanced profiling, but it is not for free. I guess there is a trail version of it.
You can take the heapdump of your application and analyze it with any memory analyzer tools to check for any memory leaks.
BTW, I am not exactly sure about the memory usage for current scenario.

Its likely the reason is the bi-directional link from Student to Class and Class to Students. When you fetch Class A (id 4500), The Class object must be hydrated, in turn this must go and pull all the Student objects (and teachers presumably) associated with this class. When this happens each Student Object must be hydrated. Which causes the fetch of every class the Student is a part of. So although you only wanted class A, you end up with:
Fetch Class A (id 4900)
Returns Class A with reference to 3 students, Student A, B, C.
Student A has ref to Class A, B (id 5500)
Class B needs hydrating
Class B has reference to Students C,D
Student C needs hydrating
Student C only has reference to Class A and B
Student C hydration complete.
Student D needs hydrating
Student D only has reference to Class B
Student B hydration complete
Class B hydration complete
Student B needs hydrating (from original class load class A)
etc... With eager fetching, this continues until all links are hydrated. The point being that its possible you end up with Classes in memory that you didn't actually want. Or whose id is not less than 5000.
This could get worse fast.
Also, you should make sure you are overriding the hashcode and equals methods. Otherwise you may be getting redundant objects, both in memory and in your set.
One way to improve is either change to LAZY loading as other have mentioned or break the bidirectional links. If you know you will only ever access students per class, then don't have the link from student back to class. For student/class example it makes sense to have the bidirectional link, but maybe it can be avoided.

as you say you "I want "all" the collections". so lazy-loading won't help.
Do you need every field of every entity? In which case use a projection to get just the bits you want. See when to use Hibernate Projections.
Alternatively consider having minimalist Teacher-Lite and Student-Lite entity that the full-fat versions extend.

Performing join given complex entity relationships in hibernate

I'm having difficulty establishing an association between two entities in Hibernate. The relationship is complicated.
We have a table (lets call it Translations) that contains translations used within our program whose primary key is the base text and the language code. This means there are several lines with the same base text but with differing language codes.
Translations:
BaseText
LanguageCode
I have a second table (lets call it Usage) which indicates, for each distinct BaseText in Translations table, where it is used in the program. The same text can be used in multiple places within the program, so there are potentially several lines with the same text but with differing programs.
Usage:
Text
Program
The one field that ties them both together is Text in Usage and BaseText in Translations. If I wanted to write a query that selects translations pertaining to a specific Program (as indicated in Usage table), I would probably write something like:
select * from Translations inner join Usage on Translations.BaseText = Usage.Text where Usage.Program = ?
However, I'm using hibernate and JPA to do this. I see many examples for creating a Parent-Child association between two entities, however this relationship is clearly a bit dysfunctional. This would probably be better represented with a middle table and a many-to-many relationship, however to make things more complicated, this is a legacy program using a legacy database, and thus I can't really add any tables.
The closest question that I could find related to my problem I found here, however it deals with a much cleaner Parent-Child type relationship. If I use only the one JoinColumn, it complains that the whole primary key is not satisfied.
Using annotations, how should I structure my entities so that I can optionally query Translation entities by Usage.program (if at least one line exists in Usage with a given program, I would like to retrieve its corresponding Translation line)?
This is what I have currently:
#Entity
public class Translation
#Id
private String baseText;
#Id
private String languageCode;
#ManyToOne(fetch=FetchType.LAZY)
#JoinColumn(name="baseText", referencedColumnName="text", nullable=false, insertable = false, updatable = false)
private Usage usage;
// .. getters and setters
}
#Entity
public class Usage {
#Id
private String text;
#Id
private String program;
// .. getters and setters
// OneToMany counterpart added, but I removed it because it seemed to
// make Usage the "owner" entity of the relationship, which isn't my case
}
I appreciate any help you can provide.

Object formation with relationships(JDBC)

There are 3 entities (which matches tables):
public class Enterprise{
private long id;
private String name;
private List<Department> departments;
//getters()/setters()
}
public class Department{
private long id;
private String name;
private List<Employee> employees;
//getters()/setters()
}
public class Employee{
private long id;
private String name;
private List<Department> departments;
//getters()/setters()
}
ENTERPRISE---|OneToMany|---DEPARTMENT---|ManyToMany|---EMPLOYEE
Can someone write method on JDBC :
List<Enterprise> findAll();
The connection, statements, queries, etc. can be ignored. The main difficulty is to set all references on the correct objects (for example, to avoid:
enterprise.getDepartments().get(1).getEmployees().get(1).getDepartments() == NULL) .
EXAMPLE (The beginning of method):
List<Enterprise> findAll(){
ResultSet rs = executeQuery(SELECT_ALL_ENTERPRISES);
List<Enterprise> ents = createEnterprises(rs);
.........

Mapping objects to relations is not as easy as it would seem. They have been working on it for decades now, with decent results only in some scenarios. The good news is that the scenarios that work can accommodate most programs.
I suggest that you take a different approach, but first I'll give you an example that will help you understand why I suggest the different approach.
Imagine a person who wants to look up all Departments, which will require a look up of all Employees (as they are part of a Department object). Which will require that for each employee, a list of departments would need to be looked up, which would require that those departments would need a list of employees, which would ....
Perhaps now you get the idea.
So many systems that are structured like yours don't actually return full Employees when looking up departments. They return "Employee identifiers". This allows one to look up all the Departments, but it guarantees that no Employees are going to be returned, preventing an infinite loop. Then, if a person is interested enough, they can use the employee identifiers to look up individual employees, which would of course contain department identifiers.
In short, I recommend that you don't really rebuild the association at this level. I suggest that you build disconnected graphs of the object mesh, such that one can easily navigate the disconnected graph at a later time. Then, if you really must connect them, you will at least have all the data loaded without recursion before you start knitting together references.

Many ORM libraries enable you to define one to many relationships as you described. Sormula can do this. See one to many example.
What I like about Sormula is that if you name the foreign key field on the "many side" the same as the field "one side", then Sormula will deduce the relationship and no annotations are necessary.

What's the best way to keep bad words into database entity?

I have some web application for placement on external sites. This application is a widget for comments like disqus (disqus.com).
In application each widget it's a entity object with a lot of many attributes.
For instance:
#Entity
class Widget {
private User admin; //same as owner, administrator, creator
private String site; //example.com
private String providers; //google_plus,facebook,twitter,lastfm and others
//... and a lot of many other attributes
}
At the now I develop the list of the bad words. This list will be managed by an administrator widget (owner).
For this task I have two ways for implements:
First way:
#Entity
class Widget {
...
#ElementCollection(fetch = FetchType.LAZY)
#Column(name = "bad_words", length = 20)
private Set<String> badWords= new HashSet<String>();
...
}
that is keep as relation in second table
Second way:
#Entity
class Widget {
...
#Column(name = "bad_words", length = 2000)
private String badWords;
...
}
that is keep as string with words splited by some separator, for instance: "fukc,azz,shitt,..."
So the question is what's the best way to keep bad words in the each widgets entity? Case 1, 2 or some thing else?

Definitely Option 1. Option 2 violates First Normal Form.
Think about having all the words concatenated as discuss in the Second Way. A lot of difficulties in querying the table will arise. Also, a lot of awful code for updating/deleting/adding words will appear. You don't want any of this.
Having all the words in a collection (First Way) will permit querying them easily and will make all the CRUD operations simpler.

IMO you should keep a separate list of bad words, rather than cramming them all in to a single column.
It's cleaner. And there are a lot of bad words.
If each widget really needs its own list of bad words, I'd leave it as-is. If the list of bad words is always the same, however, I would not associate them directly with the widget, but maintain the list as a separate entity.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.