Preloading JPA/Hibernate entities to an in-memory cache

Preloading JPA/Hibernate entities to an in-memory cache - java

at work we are using Java EE (WildFly) with an ever increasing workload. Persistence layer uses EJBs with JPA and Hibernate. One table (main data table) gets like 99% of the traffic and database size, while there are a bunch of others that are used to described the data.
It works, but sometimes it slows down, due to many description entities that have to be loaded while saving the data table entities. We can't seem to get the 2nd level Hibernate cache going, so we are currently looking into in-memory caching.
The basic idea is just a simple HashMap for each of the description entities I mentioned above. We are talking about 10 tables with 50k records total, so it wouldn't impact the database badly.
Load all of them at startup, put them in the HashMap, link them with other cached entities (Some description entities have relations between themselves). When one of the entities is updated, replace it in cache with an updated version. While the reside in the cache, they are evicted and should behave like normal POJOs.
We've also looked into some real caching solutions like JCache, Caffeine, etc, but aren't sure if we really need the features they offer.
Does any of this make sense? Or is it a stupid approach to the problem?

Second level caching of Hibernate will take care of all of that. It will transparently put objects into the cache and evict them if some transaction (of that application) changes data. 2nd level caching refers to caching of entity object in a cache. There also is a query cache, which makes use of the second level cache. Note that the query cache needs to be enabled on a per query basis though with org.hibernate.query.Query#setCacheable

Related

How check when cache is empty and I should load it

I use technologies like spring boot, jpa and java 8. I have a question, how can I check if the cache is empty and I should send a query to the database to reload it (how to check that I need to reload the cache)?

As your question is not clear about regarding what type of cache you are using ??
JPA uses the first level of caching is the persistence context.
The Entity Manager guarantees that within a single Persistence Context, for any particular database row, there will be only one object instance. However the same entity could be managed in another User's transaction, so you should use either optimistic or pessimistic locking.
If you mean 2nd level cache ,This level of cache came due to performance reasons.this 2nd level cache sits between Entity Manager and the database. Persistence context shares the cache, making the second level cache available throughout the application. Database traffic is reduced considerably because entities are loaded in to the shared cache and made available from there. So actually laying you dont need to worry about reloading of data from database if cache miss happens.
Now if you are implementing your own logic to implement the cache , then you need to do more research on how actually caching works and different algorithms for caching like LRU,MRU etc. (which I would personally not recommend as you can use existing available providers like ehcache, redis,hazelcast just few names for 2nd level caching )

Needs clarity on hibernate second level cache

I need some clarification with the Hibernate second level cache.
How does the hibernate second level cache works?
Does it loads all the data from the tables for which there is #Cacheable annotation (with respect to hibernate annotation) in the entity classes on server start up in the Java EE environment?
Will the cache gets sync up when there is an update on those tables and how?
Last one is there any ways for my DAO code to get notified when there is an updated on some table which i am interested upon? (Looking for any listener which can intimate abt the updates of the tables).

How does the hibernate second level cache works?
When your entity is marked as cacheable and if you have configured the second level cache then hibernate will cache the entity to the second level cache after the first read.
Hibernate provides the flexibility to plugin any cache implementation that follows hibernates specification. Refer Hibernate Manual for more details on second level cache and configurations options.
Does it loads all the data from the tables for which there is #Cacheable annotation (with respect to hibernate annotation) in the entity classes on server start up in the Java EE environment?
I don't think there is any configuration for achieving this. Indirectly you can achieve this by reading the entire table in startup, this can adversely affect the system startup time. (i don't prefer this). If the entity is modified externally, then hibernate can't sync it and you will end up getting stale data.
Will the cache gets sync up when there is an update on those tables and how?
The cache won't get updated instantly after the table update. The subsequent call to fetch the updated record will go the database, hibernate achieves this internally by using session timestamps.
Last one is there any ways for my DAO code to get notified when there is an updated on some table which i am interested upon? (Looking for any listener which can intimate abt the updates of the tables).
No, hibernate doesn't support this.

That's a too broad question to be answered here.
No. It populates the cache lazily. Each time you get a cachable entity from the database, using the hibernate API or a query, this entity is stored in the cache. Later, when session.get() is called with an ID of an entity that is in the cache, no database query is necessary.
If the update is made through Hibernate, then the cache is updated. If it's done using an external application, or a SQL query, or even a bulk update HQL query, then the cache is unaware of the update. That's why you need to be careful about which entities you make cachable, which time-to-live you choose, etc. Sometimes, returning stale values is not problematic, and sometimes it is unacceptable.
No.

When to use Hibernate caching (second level)?

This is a basic question about Hibernate Caching, but I've to be sure before going forward. I had use query caching before in small projects, but now I'm involved in a big project, so this is:
In really big projects (national) what are your suggestion about when to use Query Caching in Hibernate?
note: *The platform is Struts2, Spring3, Hibernate, Java6 WAS6 *

2nd level cache is used, when your db relations r complex as in that case you know hitting db each and every time will be a costly operation. Performance of app can be increased by using cache in such cases.

I reckon you mean second-level cache, that is cache which spans more than one Hibernate session.
Generally, query cache is used for queries that are heavy or often accessed, to make your app hit the database less often.
I'm not sure if your question includes entity cache, but you definitely should investigate it as well. This cache includes individual entities or their collections regardless of context (i.e. concrete queries). I would say it's the most beneficial type of caching.
The bigger your TPS or number of entities, the more you will benefit from using such cache. When you run into having a few thousand queries per transaction, fetching entities from cache (usually in RAM) rather than querying database and mapping can save a lot of precious time.
Be careful when you need 100% up-to-date (online) results.
See also:
Improving Performance at Hibernate docs.

I highly recommend the article truly understanding the second level and query caches. In general, caching has a lot of benefits but also introduces a lot of complexity, and you should have a good reason for caching, and understand what benefits/risks it will give you.
Note that turning on the query cache is by itself not enough, you need to mark things as cacheable, here is an explanation. This whole article is really good and discusses when the query cache is not helpful. Again, make sure you have a good reason for turning on query caching in your application.

Hibernate or JDBC

I have a thick client, java swing application with a schema of 25 tables and ~15 JInternalFrames (data entry forms for the tables). I need to make a design choice of straight JDBC or ORM (hibernate with spring framework in this case) for DBMS interaction. Build out of the application will occur in the future.
Would hibernate be overkill for a project of this size? An explanation of either yes or no answer would be much appreciated (or even a different approach if warranted).
TIA.

Good question with no single simple answer.
I used to be a big fan of Hibernate after using it in multiple projects over multiple years.
I used to believe that any project should default to hibernate.
Today I am not so sure.
Hibernate (and JPA) is great for some things, especially early in the development cycle.
It is much faster to get to something working with Hibernate than it is with JDBC.
You get a lot of features for free - caching, optimistic locking and so on.
On the other hand it has some hidden costs. Hibernate is deceivingly simple when you start. Follow some tutorial, put some annotations on your class - and you've got yourself persistence. But it's not simple and to be able to write good code in it requires good understanding of both it's internal workings and database design. If you are just starting you may not be aware of some issues that may bite you later on, so here is an incomplete list.
Performance
The runtime performance is good enough, I have yet to see a situation where hibernate was the reason for poor performance in production. The problem is the startup performance and how it affects your unit tests time and development performance. When hibernate loads it analyzes all entities and does a lot of pre-caching - it can take about 5-10-15 seconds for a not very big application. So your 1 second unit test is going to take 11 secods now. Not fun.
Database Independency
It is very cool as long as you don't need to do some fine tuning on the database.
In-memory Session
For every transaction Hibernate will store an object in memory for every database row it "touches". It's a nice optimization when you are doing some simple data entry. If you need to process lots of objects for some reason though, it can seriously affect performance, unless you explicitly and carefully clean up the in-memory session on your own.
Cascades
Cascades allow you to simplify working with object graphs. For example if you have a root object and some children and you save root object, you can configure hibernate to save children as well. The problem starts when your object graph grow complex. Unless you are extremely careful and have a good understanding of what goes on internally, it's easy to mess this up. And when you do it is very hard to debug those problems.
Lazy Loading
Lazy Loading means that every time you load an object, hibernate will not load all it's related objects but instead will provide place holders which will be resolved as soon as you try to access them. Great optimization right? It is, except you need to be aware of this behaviour otherwise you will get cryptic errors. Google "LazyInitializationException" for an example. And be careful with performance. Depending on the order of how you load your objects and your object graph you may hit "n+1 selects problem". Google it for more information.
Schema Upgrades
Hibernate allows easy schema changes by just refactoring java code and restarting. It's great when you start. But then you release version one. And unless you want to lose your customers you need to provide them schema upgrade scripts. Which means no more simple refactoring as all schema changes must be done in SQL.
Views and Stored Procedures
Hibernate requires exclusive write access to the data it works with. Which means you can't really use views, stored procedures and triggers as those can cause changes to data with hibernate not aware of them. You can have some external processes writing data to the database in a separate transactions. But if you do, your cache will have invalid data. Which is one more thing to care about.
Single Threaded Sessions
Hibernate sessions are single threaded. Any object loaded through a session can only be accessed (including reading) from the same thread. This is acceptable for server side applications but might complicate things unnecessary if you are doing GUI based application.
I guess my point is that there are no free meals.
Hibernate is a good tool, but it's a complex tool, and it requires time to understand it properly. If you or your team members don't have such knowledge it might be simpler and faster to go with pure JDBC (or Spring JDBC) for a single application. On the other hand if you are willing to invest time into learning it (including learning by doing and debugging) than in the future you will be able to understand the tradeoffs better.

Hibernate can be good but it and other JPA ORMs tend to dictate your database structure to a degree. For example, composite primary keys can be done in Hibernate/JPA but they're a little awkward. There are other examples.
If you're comfortable with SQL I would strongly suggest you take a look at Ibatis. It can do 90%+ of what Hibernate can but is far simpler in implementation.
I can't think of a single reason why I'd ever choose straight JDBC (or even Spring JDBC) over Ibatis. Hibernate is a more complex choice.
Take a look at the Spring and Ibatis Tutorial.

No doubt Hibernate has its complexity.
But what I really like about the Hibernate approach (some others too) is the conceptual model you can get in Java is better. Although I don't think of OO as a panacea, and I don't look for theoritical purity of the design, I found so many times that OO does in fact simplify my code. As you asked specifically for details, here are some examples :
the added complexity is not in the model and entities, but in your framework for manipulating all entities for example. For maintainers, the hard part is not a few framework classes but your model, so Hibernate allows you to keep the hard part (the model) at its cleanest.
if a field (like an id, or audit fields, etc) is used in all your entities, then you can create a superclass with it. Therefore :
you write less code, but more importantly ...
there are less concepts in your model (the unique concept is unique in the code)
for free, you can write code more generic, that provided with an entity (unknown, no type-switching or cast), allows you to access the id.
Hibernate has also many features to deal with other model caracteristics you might need (now or later, add them only as needed). Take it as an extensibility quality for your design.
You might replace inheritance (subclassing) by composition (several entities having a same member, that contains a few related fields that happen to be needed in several entities).
There can be inheritance between a few of your entities. It often happens that you have two tables that have pretty much the same structure (but you don't want to store all data in one table, because you would loose referential integrity to a different parent table).
With reuse between your entities (but only appropriate inheritance, and composition), there is usually some additional advantages to come. Examples :
there is often some way to read the data of the entities that is similar but different. Suppose I read the "title" field for three entities, but for some I replace the result with a differing default value if it is null. It is easy to have a signature "getActualTitle" (in a superclass or an interface), and implement the default value handling in the three implementations. That means the code out of my entities just deals with the concept of an "actual title" (I made this functional concept explicit), and the method inheritance takes care of executing the correct code (no more switch or if, no code duplication).
...
Over time, the requirements evolve. There will be a point where your database structure has problems. With JDBC alone, any change to the database must impact the code (ie. double cost). With Hibernate, many changes can be absorbed by changing only the mapping, not the code. The same happens the other way around : Hibernate lets you change your code (between versions for example) without altering your database (changing the mapping, although it is not always sufficient). To summarize, Hibernate lets your evolve your database and your code independtly.
For all these reasons, I would choose Hibernate :-)

I think either is a fine choice, but personally I would use hibernate. I don't think hibernate is overkill for a project of that size.
Where Hibernate really shines for me is dealing with relationships between entities/tables. Doing JDBC by hand can take a lot of code if you deal with modifying parent and children (grandchildren, siblings, etc) at the same time. Hibernate can make this a breeze (often a single save of the parent entity is enough).
There are certainly complexities when dealing with Hibernate though, such as understanding how the Session flushing works, and dealing with lazy loading.

Straight JDBC would fit the simplest cases at best.
If you want to stay within Java and OOD then going Hibernate or Hibernate/JPA or any-other-JPA-provider/JPA should be your choice.
If you are more comfortable with SQL then having Spring for JDBC templates and other SQL-oriented frameworks won't hurt.
In contrast, besides transactional control, there is not much help from having Spring when working with JPA.

Hibernate best suits for the middleware applications. Assume that we build a middle ware on top of the data base, The middelware is accessed by around 20 applications in that case we can have a hibernate which satisfies the requirement of all 20 applications.

In JDBC, if we open a database connection we need to write in try, and if any exceptions occurred catch block will takers about it, and finally used to close the connections.
In jdbc all exceptions are checked exceptions, so we must write code in try, catch and throws, but in hibernate we only have Un-checked exceptions
Here as a programmer we must close the connection, or we may get a chance to get our of connections message…!
Actually if we didn’t close the connection in the finally block, then jdbc doesn’t responsible to close that connection.
In JDBC we need to write Sql commands in various places, after the program has created if the table structure is modified then the JDBC program doesn’t work, again we need to modify and compile and re-deploy required, which is tedious.
JDBC used to generate database related error codes if an exception will occurs, but java programmers are unknown about this error codes right.
While we are inserting any record, if we don’t have any particular table in the database, JDBC will rises an error like “View not exist”, and throws exception, but in case of hibernate, if it not found any table in the database this will create the table for us
JDBC support LAZY loading and Hibernate supports Eager loading
Hibernate supports Inheritance, Associations, Collections
In hibernate if we save the derived class object, then its base class object will also be stored into the database, it means hibernate supporting inheritance
Hibernate supports relationships like One-To-Many,One-To-One, Many-To- Many-to-Many, Many-To-One
Hibernate supports caching mechanism by this, the number of round trips between an application and the database will be reduced, by using this caching technique an application performance will be increased automatically
Getting pagination in hibernate is quite simple.
Hibernate has capability to generate primary keys automatically while we are storing the records into database

... In-memory Session ... LazyInitializationException ...
You could look at Ebean ORM which doesn't use session objects ... and where lazy loading just works. Certainly an option, not overkill, and will be simpler to understand.

if billions of user using out app or web then in jdbc query will get executed billions of time but in hibernate query will get executed only once for any number of user most important and easy advantage of hibernate over jdbc.

Update a single object across multiple process in java

A couple of Relational DB tables are managed by a single object cache that resides in a process. When the cache is committed the tables are updated. The DB relational tables are updated by regular SQL queries and not anything more fancier like hibernate.
Eventually, other processes got into the business of modifying this object without communicating with one another i.e, Each process would initialize this object (read from DB) and update it( commit to DB), & other process would not know about it holding on to a stale cache.
I have to fix this workflow. I have thought of couple of methods.
One is to make this object an mBean. So, the object would reside on one process and every process would eventually modify the object in that process by mBean method invocations.
However, this approach has a couple of problems.
1) Every object returned by this cache has be an mBean, which could make the method invocations quite chatty.
2) Also there is a requirement that every process should see a consistent data model(cache) of the DB, and it should merge its contents to the DB if possible. (like a transaction). If the DB was updated by some other process significantly, it is OK for the merge to fail.
What technologies in Java will help to solve this problem?

You should have a look at Terracotta. They have technology that makes multiple JVMs (can be on different servers) appear unified. If you update an object on one JVM, Terracotta will update the instance transparently on all JVMs in the cluster in a safe way.

If you wanted to keep the object model, you could use java object cache for centralized storage before committing. Or you could keep a shared lock using zookeeper.
But it sounds like you should really abandon the self-managed cache. Use hibernate or another JPA implementation, which you mentioned. JPA addresses the cache issues and maintains a L2 shared cache, so they've thought about this for you.

I agree with John - use a second level cache in hibernate with support for clustering. Much more straightforward way to manage data by using a simplified data access model and let Hibernate manage the details.
Terracotta Ehcache is one such cache, so is JBoss, Coherence, etc.
More info on Hibernate Second Level Cache can be had here and in the official Hibernate docs on Chapter 19. Improving Performance (note that the while the Hibernate docs do list second level cache providers, the list is woefully out of date, for example who uses Swarm Cache? The last release of that was in 2003)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.