Hibernate L2 query cache: do not hit database on cache miss - java

I have a table, Users with primary key id :: int4 and natural key password :: varchar(32). I'd like to check existence of a row by compund id and password in DB as fast as possible using Hibernate.
So I load all users to L2 cache and do
User u = (User)session.get(User.class, uId);
if (!u.getPassword().equals(pass)) {
// fail when passwords are not equal
}
This is good when cache was hit, but on cache miss (which means false input data) this will trigger select queries. How can I point hibernate not to hit database, if value not found in cache?
I see an option to load User directly from cache and then use something like session.merge() it. But maybe there is a better way?
PS. I have one more complaint. If passwords are not equal, I have small performance degradation on dehydration of my User object (haven't profiled yet). Can this also be eliminated?

You are using the L2 cache against its intentions: it is always legal for a cache to experience a miss. By its nature the cache does not guarantee to be a 100% replica of an entire table.
If you want a reliable replica of the complete User table, then construct your own HashMap<String,String>.

Related

How to keep a java list in memory synced with a table in database?

I want to perform a search of a inputs in a list. That list resides in a database. I see two options for doing that-
Hit the db for each search and return the result.
keep a copy in memory synced with table and search in memory and return the result.
I like the second option as it will be faster. However I am confused on how to keep the list in sync with table.
example : I have a list L = [12,11,14,42,56]
and I receive an input : 14
I need to return the result if the input does exists in the list or not. The list can be updated by other applications. I need to keep the list in sync with table.
What would be the most optimized approach here and how to keep the list in sync with database?
Is there any way my application can be informed of the changes in the table so that I can reload the list on demand.
Instead of recreating your own implementation of something that already exists, I would leverage Hibernate's Second Level Cache (2LC) with an implementation such as EhCache.
By using a 2LC, you can specify the time-to-live expiration time for your entities and once they expire, any query would reload them from the database. If the entity cache has not yet expired, Hibernate will hydrate them from the 2LC application cache rather than the database.
If you are using Spring, you might also want to take a look at #Cachable. This operates at the component / bean tier allowing Spring to cache a result-set into a named region. See their documentation for more details.
To satisfied your requirement, you should control the read and write in one place, otherwise, there will always be some unsync case for the data.

Multiple key and value pair search

we have a plan to cache DB table on application side (to avoid DB calls). Our cache is key & value pair implementation. If I use primary key (column1) as key and all other data as value, how can we execute below queries against cache?
select * from table where column1=?
select * from table where
column2=? and column3=? select * from table where column4=? and
column5=? and column6=?
One simplest option is to build 3 caches as below.
(column1) --> Data (column2+column3) --> Data (column4+column5) -->
Data
Any other better options?
Key points:
Table contains millions of records
We are using Java ConcurrentHashMap for cache implementation.
Looks like you want an in-memory cache. Guava has cool caches--you would need a LoadingCache.
Here is the link to LoadingCache
Basically, for your problem, the idea would be to have three LoadingCache. LoadingCache has a method that you should implement. The method tells loading cache given the input, how to get the data in case of a cache miss. So, the first time you access the loading cache for query1, there would be a cache miss. The loading cache would use the method you implemented (your classic DAO method) to get the data, put it in the cache, and return it to you. The next time you access it, it will be served from your in-memory guava cache.
So if you have three methods
Data getData(Column1 column)
Data getData(Column2 column2, Column3 column3)
Data getData(Column4 column4, Column5 column5, Column6 column6)
your three LoadingCache will call these methods from the load implementation you write. And that's it. I find it very clean and simple to get what you want.
You mentioned that you have to cache millions of records. Thats quite a big number. I do not recommened you building your own caching framework, especially not based on simplistic datastructures such as HashMaps.
I highly recommend Redis - Check at http://redis.io. Companies such as Twitter, Stackoverflow etc are using Redis for their caches.
Here is the live demonstration of Redis - http://try.redis.io

Is there any heuristic/pattern for logging user actions

I have a GWT/Java/Hibernate/MySQL application (but I think any web pattern could be valid) that do a CRUD on several objects. Each object is stored in a table in the database. I want to implement an action logger. For example for Object A I want to know who created it and modified it, and for User B, what actions did he perform.
My idea is to have a History table that stores : UserId, ObjectId, ActionName. The UserId and ObjectId are foreign keys. Am I on the right track ?
I also think this is the right direction.
However, bare in mind that in an application with lots of traffic, this logs can become overhead.
I would suggest the following in this case -
A. Don't use hibernate for this "action logging" - Hibernate has better performance for "mostly read DB"
B. Consider DB that is better in "mostly write" scenario for the action logging table.
You can try to look for a NoSQL solution for this.
C. If you use such NoSQL DB, but still want to keep the logging actions in the relational DB, have an offline process that runs once in a day for example), that will query your "action logging DB" and will insert it to the relational DB.
D. If it's ok that your system might lose some action logging, consider using producer/consumer pattern (for example - use a queue between producer and consumer thread) - the threads that need to log actions will not log them synchronously, but will log them asynchronously.
E. In addition, don't forget that such logging table has the potential to be over-flooded in time, causing queries on it to take a long time. For these issues consider the following:
E.1. Every day remove really old logs - let's say - older than month, or move them to some "backup" table.
E.2 Index some fields that you mostly use for action logging queries (for example - maybe an action_type) field.
If only changes to specific fields, e.g., something like status in a users table, should be tracked, I would use a user_status_histories table being referenced from the users table via foreign key. The user_status_histories table would contain fields such as current_status, date and something like admin_who_modified_the_status.
Whenever a status change is made, a new record would be inserted into the user_status_histories table. This would allow easy querying of all status changes.
Of course, querying a user would then require a (LEFT or INNER) JOIN with the user_status_histories table in order to get the last record (= the current status).
Depending on your needs, you might think of a current_status field in the users table (besides the status serving as foreign key) for fast access, which would be maintained parallel to the user_status_histories table.
Yes you are. Another very similar framework is one which supports undo and redo. These frameworks track user actions and have the additional ability to restore state to the way it was before the user action.

Hibernate loading all entities utilizing 1st or 2nd level cache

We have an entire table of entities that we need to load during a hibernate session and the only way I know to load all entities is through an HQL query:
public <T> List<T> getAllEntities(final Class<T> entityClass) {
if (null == entityClass)
throw new IllegalArgumentException("entityClass can't be null");
List<T> list = castResultList(createQuery(
"select e from " + entityClass.getSimpleName() + " e ").list());
return list;
}
We use EHcache for 2nd level caching.
The problem is this gets called 100's of times in a given transaction session and takes up a considerable portion of the total time. Is there any way to load all entities of a given type (load an entire table) and still benefit from 1st level session cache or 2nd level ehcache.
We've been told to stay away from query caching because of their potential performance penalties relative to their gains.
* Hibernate Query Cache considered harmful
Although we're doing performance profiling right now so it might be time to try turning on query cache.
L1 and L2 cache can't help you much with the problem of "get an entire table."
The L1 cache is ill-equipped because if someone else inserted something, it's not there. (You may "know" that no one else would ever do so within the business rules of the system, but the Hibernate Session doesn't.) Hence you have to go look in the DB to be sure.
With the L2 cache, things may have been expired or flushed since the last time anybody put the table in there. This can be at the mercy of the cache provider or even done totally externally, maybe through a MBean. So Hibernate can't really know at any given time if what's in the cache for that type represents the entire contents of the table. Again, you have to look in the DB to be sure.
Since you have special knowledge about this Entity (new ones are never created) that there isn't a practical way to impart on the L1 or L2 caches, you need to either use the tool provided by Hibernate for when you have special business-rules-level knowledge about a result set, query cache, or cache the info yourself.
--
If you really really want it in the L2 cache, you could in theory make all entities in the table members of a collection on some other bogus entity, then enable caching the collection and manage it secretly in the DAO. I don't think it could possibly be worth having that kind of bizarreness in your code though :)
Query cache is considered harmful if and only if the underlying table changes often. In your case the table is changed once a day. So the query would stay in cache for 24 hours. Trust me: use the query cache for it. It is a perfect use case for a query cache.
Example of harmful query cache: if you have a user table and you use the query cache for "from User where username = ..." then this query will evict from cache each time the user table is modified (another user changes/deletes his account). So ANY modification of this table triggers cache eviction. The only way to improve this situation is querying by natural-id, but this is another story.
If you know your table will be modified only once a day as in your case, the query cache will only evict once a day!
But pay attention on your logic when modifying the table. If you do it via hibernate everything is fine. If you use a direct query you have to tell hibernate that you have modified the table (something like query.addSynchronizedEntity(..)). If you do it via shell script you need to adjust the time-to-live of the underlying cache region.
Your answer is by the way reimplementing the query cache as the query cache just caches the list of ids. The actual objects are looked up in L1/L2 cache. so you still need to cache the entities when you use the query cache.
Please mark this as the correct answer for further reference.
We ended up solving this by storing in memory the primary keys to all the entities in the table we needed to load (because they're template data and no new templates are added/removed).
Then we could use this list of primary keys to look up each entity and utilize Hibernates 1st and 2nd level cache.

Hibernate 2nd Level caching doesnt seem to be working

Im currently trying to get hibernate working using the caching provider that comes with hibernate.
net.sf.ehcache.hibernate.SingletonEhCacheProvider
I have a default cache and a class specific cache enabled in the ecache.xml which is referenced in my hibernate.cfg.xml file. The class/mapping file specific cache is defined to handle upto 20000 objects.
However, I'm seeing no perfrormance gains since I turned on the cache mapping on one of the mapping files Im testing this with.
My test is as follows.
Load 10000 objects of the particular mapping file im testing (this should hit the DB and be a bottle neck).
Next I go to load the same 10000 objects, as this point I would expect the cache to be hit and to see significant performance gains. Have tried using both "read-only" and "read-write" cache mapping on the hibernate mapping xml file Im testing with.
I'm wondering is their anything I need to be doing to ensure the cache is being hit before the DB when loading objects?
Note as part of the test im pagin through these 10000 records using something similar to below ( paging a 1000 records at time).
Criteria crit = HibernateUtil.getSession() .createCriteria( persistentClass );
crit.setFirstResult(startIndex);
crit.setFetchSize(fetchSize);
return crit.list();
Have seen that criteria has a caching mode setter ( setCacheMode() ) so is there something I should be doing with that??
I notice using the below stats code that theres 10000 objects (well hiberante dehydrated onjects i imagine??) in memory but
for some reason I'm getting 0 hits and more worryingly 0 misses so it looks like its not going to the cache at all when its doing a look up even though the stats code seems to be telling me that theres 10000 objects in memory.
Any ideas on what im doing worng? I take it the fact im getting misses is good as it means the cache is being used, but i cant figure out why im not getting any cache hits. Is it down to the fact im using setFirstResult() and setFetchSize() with criteria.
System.out.println("Cache Misses = " + stats.getSecondLevelCacheMissCount());
System.out.println("Cache Hits Count = " + stats.getSecondLevelCacheHitCount());
System.out.println("2nd level elements in mem "+ stats.getSecondLevelCacheStatistics("com.SomeTestEntity").getElementCountInMemory());
The second level cache works for "find by primary key". For other queries, you need to cache the query (provided the query cache is enabled), in your case using Criteria#setCacheable(boolean):
Criteria crit = HibernateUtil.getSession().createCriteria( persistentClass );
crit.setFirstResult(startIndex);
crit.setFetchSize(fetchSize);
crit.setCachable(true); // Enable caching of this query result
return crit.list();
I suggest to read:
Hibernate: Truly Understanding the Second-Level and Query Caches
If I cache the query, are all them hibernate entities from the query then available in the second level cache?
Yes they will. This is explained black on white in the link I mentioned: "Note that the query cache does not cache the state of the actual entities in the result set; it caches only identifier values and results of value type. So the query cache should always be used in conjunction with the second-level cache". Did you read it?
As i was under the impression that using the query cache was entirely different than using the hibernate 2nd level cache.
It is different (the "key" used for the cache entrie(s) is different). But the query caches relies on the L2 cache.
From your answer you seem to be suggesting that the query cache and second level cache are both the same, and to generate cache hits I need to be using the "find by primary key".
I'm just saying you need to cache the query since you're not "finding by primary key". I don't get what is not clear. Did you try to call setCacheable(true) on your query or criteria object? Sorry for insisting but, did you read the link I posted?

Categories

Resources