Hibernate 2nd Level caching doesnt seem to be working - java

Im currently trying to get hibernate working using the caching provider that comes with hibernate.
net.sf.ehcache.hibernate.SingletonEhCacheProvider
I have a default cache and a class specific cache enabled in the ecache.xml which is referenced in my hibernate.cfg.xml file. The class/mapping file specific cache is defined to handle upto 20000 objects.
However, I'm seeing no perfrormance gains since I turned on the cache mapping on one of the mapping files Im testing this with.
My test is as follows.
Load 10000 objects of the particular mapping file im testing (this should hit the DB and be a bottle neck).
Next I go to load the same 10000 objects, as this point I would expect the cache to be hit and to see significant performance gains. Have tried using both "read-only" and "read-write" cache mapping on the hibernate mapping xml file Im testing with.
I'm wondering is their anything I need to be doing to ensure the cache is being hit before the DB when loading objects?
Note as part of the test im pagin through these 10000 records using something similar to below ( paging a 1000 records at time).
Criteria crit = HibernateUtil.getSession() .createCriteria( persistentClass );
crit.setFirstResult(startIndex);
crit.setFetchSize(fetchSize);
return crit.list();
Have seen that criteria has a caching mode setter ( setCacheMode() ) so is there something I should be doing with that??
I notice using the below stats code that theres 10000 objects (well hiberante dehydrated onjects i imagine??) in memory but
for some reason I'm getting 0 hits and more worryingly 0 misses so it looks like its not going to the cache at all when its doing a look up even though the stats code seems to be telling me that theres 10000 objects in memory.
Any ideas on what im doing worng? I take it the fact im getting misses is good as it means the cache is being used, but i cant figure out why im not getting any cache hits. Is it down to the fact im using setFirstResult() and setFetchSize() with criteria.
System.out.println("Cache Misses = " + stats.getSecondLevelCacheMissCount());
System.out.println("Cache Hits Count = " + stats.getSecondLevelCacheHitCount());
System.out.println("2nd level elements in mem "+ stats.getSecondLevelCacheStatistics("com.SomeTestEntity").getElementCountInMemory());

The second level cache works for "find by primary key". For other queries, you need to cache the query (provided the query cache is enabled), in your case using Criteria#setCacheable(boolean):
Criteria crit = HibernateUtil.getSession().createCriteria( persistentClass );
crit.setFirstResult(startIndex);
crit.setFetchSize(fetchSize);
crit.setCachable(true); // Enable caching of this query result
return crit.list();
I suggest to read:
Hibernate: Truly Understanding the Second-Level and Query Caches
If I cache the query, are all them hibernate entities from the query then available in the second level cache?
Yes they will. This is explained black on white in the link I mentioned: "Note that the query cache does not cache the state of the actual entities in the result set; it caches only identifier values and results of value type. So the query cache should always be used in conjunction with the second-level cache". Did you read it?
As i was under the impression that using the query cache was entirely different than using the hibernate 2nd level cache.
It is different (the "key" used for the cache entrie(s) is different). But the query caches relies on the L2 cache.
From your answer you seem to be suggesting that the query cache and second level cache are both the same, and to generate cache hits I need to be using the "find by primary key".
I'm just saying you need to cache the query since you're not "finding by primary key". I don't get what is not clear. Did you try to call setCacheable(true) on your query or criteria object? Sorry for insisting but, did you read the link I posted?

Related

Load entire tables including relationships into memory with JPA

I have to process a huge amount of data distributed over 20 tables (~5 million records in summary) and I need to efficently load them.
I'm using Wildfly 14 and JPA/Hibernate.
Since in the end, every single record will be used by the business logic (in the same transaction), I decided to pre-load the entire content of the required tables into memory via simply:
em.createQuery("SELECT e FROM Entity e").size();
After that, every object should be availabe in the transaction and thus be available via:
em.find(Entity.class, id);
But this doesn't work somehow and there are still a lot of calls to the DB, especially for the relationships.
How can I efficiently load the whole content of the required tables including
the relationships and make sure I got everything / there will be no further DB calls?
What I already tried:
FetchMode.EAGER: Still too many single selects / object graph too complex
EntityGraphs: Same as FetchMode.EAGER
Join fetch statements: Best results so far, since it simultaneously populates the relationships to the referred entities
2nd Level / Query Cache: Not working, probably the same problem as em.find
One thing to note is that the data is immutable (at least for a specific time) and could also be used in other transactions.
Edit:
My plan is to load and manage the entire data in a #Singleton bean. But I want to make sure I'm loading it the most efficient way and be sure the entire data is loaded. There should be no further queries necessary when the business logic is using the data. After a specific time (ejb timer), I'm going to discard the entire data and reload the current state from the DB (always whole tables).
Keep in mind, that you'll likely need a 64-bit JVM and a large amount of memory. Take a look at Hibernate 2nd Level Cache. Some things to check for since we don't have your code:
#Cacheable annotation will clue Hibernate in so that the entity is cacheable
Configure 2nd level caching to use something like ehcache, and set the maximum memory elements to something big enough to fit your working set into it
Make sure you're not accidentally using multiple sessions in your code.
If you need to process things in this way, you may want to consider changing your design to not rely on having everything in memory, not using Hibernate/JPA, or not use an app server. This will give you more control of how things are executed. This may even be a better fit for something like Hadoop. Without more information it's hard to say what direction would be best for you.
I understand what you're asking but JPA/Hibernate isn't going to want to cache that much data for you, or at least I wouldn't expect a guarantee from it. Consider that you described 5 million records. What is the average length per record? 100 bytes gives 500 megabytes of memory that'll just crash your untweaked JVM. Probably more like 5000 bytes average and that's 25 gB of memory. You need to think about what you're asking for.
If you want it cached you should do that yourself or better yet just use the results when you have them. If you want a memory based data access you should look at a technology specifically for that. http://www.ehcache.org/ seems popular but it's up to you and you should be sure you understand your use case first.
If you are trying to be database efficient then you should just understand what your doing and design and test carefully.
Basically it should be a pretty easy task to load entire tables with one query each table and link the objects, but JPA works different as to be shown in this example.
The biggest problem are #OneToMany/#ManyToMany-relations:
#Entity
public class Employee {
#Id
#Column(name="EMP_ID")
private long id;
...
#OneToMany(mappedBy="owner")
private List<Phone> phones;
...
}
#Entity
public class Phone {
#Id
private long id;
...
#ManyToOne
#JoinColumn(name="OWNER_ID")
private Employee owner;
...
}
FetchType.EAGER
If defined as FetchType.EAGER and the query SELECT e FROM Employee e Hibernate generates the SQL statement SELECT * FROM EMPLOYEE and right after it SELECT * FROM PHONE WHERE OWNER_ID=? for every single Employee loaded, commonly known as 1+n problem.
I could avoid the n+1 problem by using the JPQL-query SELECT e FROM Employee e JOIN FETCH e.phones, which will result in something like SELECT * FROM EMPLOYEE LEFT OUTER JOIN PHONE ON EMP_ID = OWNER_ID.
The problem is, this won't work for a complex data model with ~20 tables involved.
FetchType.LAZY
If defined as FetchType.LAZY the query SELECT e FROM Employee e will just load all Employees as Proxies, loading the related Phones only when accessing phones, which in the end will lead into the 1+n problem as well.
To avoid this it is pretty obvious to just load all the Phones into the same session SELECT p FROM Phone p. But when accessing phones Hibernate will still execute SELECT * FROM PHONE WHERE OWNER_ID=?, because Hibernate doesn't know that there are already all Phones in its current session.
Even when using 2nd level cache, the statement will be executed on the DB because Phone is indexed by its primary key in the 2nd level cache and not by OWNER_ID.
Conclusion
There is no mechanism like "just load all data" in Hibernate.
It seems there is no other way than keep the relationships transient and connect them manually or even just use plain old JDBC.
EDIT:
I just found a solution which works very well. I defined all relevant #ManyToMany and #OneToMany as FetchType.EAGER combinded with #Fetch(FetchMode.SUBSELECT) and all #ManyToOne with #Fetch(FetchMode.JOIN), which results in an acceptable loading time. Next to adding javax.persistence.Cacheable(true) to all entities I added org.hibernate.annotations.Cache to every relevant collection, which enables collection caching in the 2nd level cache. I disabled 2nd level cache timeout eviction and "warm up" the 2nd level cache via #Singleton EJB combined with #Startup on server start / deploy. Now I have 100% control over the cache, there are no further DB calls until I manually clear it.

Hibernate L2 query cache: do not hit database on cache miss

I have a table, Users with primary key id :: int4 and natural key password :: varchar(32). I'd like to check existence of a row by compund id and password in DB as fast as possible using Hibernate.
So I load all users to L2 cache and do
User u = (User)session.get(User.class, uId);
if (!u.getPassword().equals(pass)) {
// fail when passwords are not equal
}
This is good when cache was hit, but on cache miss (which means false input data) this will trigger select queries. How can I point hibernate not to hit database, if value not found in cache?
I see an option to load User directly from cache and then use something like session.merge() it. But maybe there is a better way?
PS. I have one more complaint. If passwords are not equal, I have small performance degradation on dehydration of my User object (haven't profiled yet). Can this also be eliminated?
You are using the L2 cache against its intentions: it is always legal for a cache to experience a miss. By its nature the cache does not guarantee to be a 100% replica of an entire table.
If you want a reliable replica of the complete User table, then construct your own HashMap<String,String>.

Efficient cache-aware fetching of multiple entities given their ids

This is JPA2 running on Hibernate.
I want to retrieve multiple instances of the same entity type, given their ids. Many of them will already be in the persistence context and/or second-level cache.
I tried several approaches, but all seem to have their drawbacks:
When I iterate over the ids with entityManager.find(id), I get one query for each non-cached item, that is, too many queries.
With a query of the form SELECT e FROM MyEntity e WHERE e.id in (:ids), the cached entries will be reloaded from the db.
I can manually check the cache beforehand for each of the ids using entityManager.getEntityManagerFactory().getCache().contains(id). This works for the second-level cache, but will return false on entries that are in the persistence context, but not in the second-level cache.
What is the best way of doing this without choosing between loading inefficiently and loading too much?
You should* be able to speculatively pull an entity out of the session cache like this:
T obj = entityManager.getReference(entityClass, id);
boolean inSessionCache = entityManager.getEntityManagerFactory().isLoaded(obj);
This still leaves you with a pretty gross solution, i admit.
(* might)

Hibernate loading all entities utilizing 1st or 2nd level cache

We have an entire table of entities that we need to load during a hibernate session and the only way I know to load all entities is through an HQL query:
public <T> List<T> getAllEntities(final Class<T> entityClass) {
if (null == entityClass)
throw new IllegalArgumentException("entityClass can't be null");
List<T> list = castResultList(createQuery(
"select e from " + entityClass.getSimpleName() + " e ").list());
return list;
}
We use EHcache for 2nd level caching.
The problem is this gets called 100's of times in a given transaction session and takes up a considerable portion of the total time. Is there any way to load all entities of a given type (load an entire table) and still benefit from 1st level session cache or 2nd level ehcache.
We've been told to stay away from query caching because of their potential performance penalties relative to their gains.
* Hibernate Query Cache considered harmful
Although we're doing performance profiling right now so it might be time to try turning on query cache.
L1 and L2 cache can't help you much with the problem of "get an entire table."
The L1 cache is ill-equipped because if someone else inserted something, it's not there. (You may "know" that no one else would ever do so within the business rules of the system, but the Hibernate Session doesn't.) Hence you have to go look in the DB to be sure.
With the L2 cache, things may have been expired or flushed since the last time anybody put the table in there. This can be at the mercy of the cache provider or even done totally externally, maybe through a MBean. So Hibernate can't really know at any given time if what's in the cache for that type represents the entire contents of the table. Again, you have to look in the DB to be sure.
Since you have special knowledge about this Entity (new ones are never created) that there isn't a practical way to impart on the L1 or L2 caches, you need to either use the tool provided by Hibernate for when you have special business-rules-level knowledge about a result set, query cache, or cache the info yourself.
--
If you really really want it in the L2 cache, you could in theory make all entities in the table members of a collection on some other bogus entity, then enable caching the collection and manage it secretly in the DAO. I don't think it could possibly be worth having that kind of bizarreness in your code though :)
Query cache is considered harmful if and only if the underlying table changes often. In your case the table is changed once a day. So the query would stay in cache for 24 hours. Trust me: use the query cache for it. It is a perfect use case for a query cache.
Example of harmful query cache: if you have a user table and you use the query cache for "from User where username = ..." then this query will evict from cache each time the user table is modified (another user changes/deletes his account). So ANY modification of this table triggers cache eviction. The only way to improve this situation is querying by natural-id, but this is another story.
If you know your table will be modified only once a day as in your case, the query cache will only evict once a day!
But pay attention on your logic when modifying the table. If you do it via hibernate everything is fine. If you use a direct query you have to tell hibernate that you have modified the table (something like query.addSynchronizedEntity(..)). If you do it via shell script you need to adjust the time-to-live of the underlying cache region.
Your answer is by the way reimplementing the query cache as the query cache just caches the list of ids. The actual objects are looked up in L1/L2 cache. so you still need to cache the entities when you use the query cache.
Please mark this as the correct answer for further reference.
We ended up solving this by storing in memory the primary keys to all the entities in the table we needed to load (because they're template data and no new templates are added/removed).
Then we could use this list of primary keys to look up each entity and utilize Hibernates 1st and 2nd level cache.

Using Hibernate's ScrollableResults to slowly read 90 million records

I simply need to read each row in a table in my MySQL database using Hibernate and write a file based on it. But there are 90 million rows and they are pretty big. So it seemed like the following would be appropriate:
ScrollableResults results = session.createQuery("SELECT person FROM Person person")
.setReadOnly(true).setCacheable(false).scroll(ScrollMode.FORWARD_ONLY);
while (results.next())
storeInFile(results.get()[0]);
The problem is the above will try and load all 90 million rows into RAM before moving on to the while loop... and that will kill my memory with OutOfMemoryError: Java heap space exceptions :(.
So I guess ScrollableResults isn't what I was looking for? What is the proper way to handle this? I don't mind if this while loop takes days (well I'd love it to not).
I guess the only other way to handle this is to use setFirstResult and setMaxResults to iterate through the results and just use regular Hibernate results instead of ScrollableResults. That feels like it will be inefficient though and will start taking a ridiculously long time when I'm calling setFirstResult on the 89 millionth row...
UPDATE: setFirstResult/setMaxResults doesn't work, it turns out to take an unusably long time to get to the offsets like I feared. There must be a solution here! Isn't this a pretty standard procedure?? I'm willing to forgo Hibernate and use JDBC or whatever it takes.
UPDATE 2: the solution I've come up with which works ok, not great, is basically of the form:
select * from person where id > <offset> and <other_conditions> limit 1
Since I have other conditions, even all in an index, it's still not as fast as I'd like it to be... so still open for other suggestions..
Using setFirstResult and setMaxResults is your only option that I'm aware of.
Traditionally a scrollable resultset would only transfer rows to the client on an as required basis. Unfortunately the MySQL Connector/J actually fakes it, it executes the entire query and transports it to the client, so the driver actually has the entire result set loaded in RAM and will drip feed it to you (evidenced by your out of memory problems). You had the right idea, it's just shortcomings in the MySQL java driver.
I found no way to get around this, so went with loading large chunks using the regular setFirst/max methods. Sorry to be the bringer of bad news.
Just make sure to use a stateless session so there's no session level cache or dirty tracking etc.
EDIT:
Your UPDATE 2 is the best you're going to get unless you break out of the MySQL J/Connector. Though there's no reason you can't up the limit on the query. Provided you have enough RAM to hold the index this should be a somewhat cheap operation. I'd modify it slightly, and grab a batch at a time, and use the highest id of that batch to grab the next batch.
Note: this will only work if other_conditions use equality (no range conditions allowed) and have the last column of the index as id.
select *
from person
where id > <max_id_of_last_batch> and <other_conditions>
order by id asc
limit <batch_size>
You should be able to use a ScrollableResults, though it requires a few magic incantations to get working with MySQL. I wrote up my findings in a blog post (http://www.numerati.com/2012/06/26/reading-large-result-sets-with-hibernate-and-mysql/) but I'll summarize here:
"The [JDBC] documentation says:
To enable this functionality, create a Statement instance in the following manner:
stmt = conn.createStatement(java.sql.ResultSet.TYPE_FORWARD_ONLY,
java.sql.ResultSet.CONCUR_READ_ONLY);
stmt.setFetchSize(Integer.MIN_VALUE);
This can be done using the Query interface (this should work for Criteria as well) in version 3.2+ of the Hibernate API:
Query query = session.createQuery(query);
query.setReadOnly(true);
// MIN_VALUE gives hint to JDBC driver to stream results
query.setFetchSize(Integer.MIN_VALUE);
ScrollableResults results = query.scroll(ScrollMode.FORWARD_ONLY);
// iterate over results
while (results.next()) {
Object row = results.get();
// process row then release reference
// you may need to evict() as well
}
results.close();
This allows you to stream over the result set, however Hibernate will still cache results in the Session, so you’ll need to call session.evict() or session.clear() every so often. If you are only reading data, you might consider using a StatelessSession, though you should read its documentation beforehand."
Set fetch size in query to an optimal value as given below.
Also, when caching is not required, it may be better to use StatelessSession.
ScrollableResults results = session.createQuery("SELECT person FROM Person person")
.setReadOnly(true)
.setFetchSize( 1000 ) // <<--- !!!!
.setCacheable(false).scroll(ScrollMode.FORWARD_ONLY)
FetchSize must be Integer.MIN_VALUE, otherwise it won't work.
It must be literally taken from the official reference: https://dev.mysql.com/doc/connector-j/5.1/en/connector-j-reference-implementation-notes.html
Actually you could have gotten what you wanted -- low-memory scrollable results with MySQL -- if you had used the answer mentioned here:
Streaming large result sets with MySQL
Note that you will have problems with Hibernate lazy-loading because it will throw an exception on any queries performed before the scroll is finished.
With 90 million records, it sounds like you should be batching your SELECTs. I've done with with Oracle when doing the initial load into a distrbuted cache. Looking at the MySQL documentation, the equivalent seems to be using the LIMIT clause: http://dev.mysql.com/doc/refman/5.0/en/select.html
Here's an example:
SELECT * from Person
LIMIT 200, 100
This would return rows 201 through 300 of the Person table.
You'd need to get the record count from your table first and then divide it by your batch size and work out your looping and LIMIT parameters from there.
The other benefit of this would be parallelism - you can execute multiple threads in parallel on this for faster processing.
Processing 90 million records also doesn't sound like the sweet spot for using Hibernate.
The problem could be, that Hibernate keeps references to all objests in the session until you close the session. That has nothing to do with query caching. Maybe it would help to evict() the objects from the session, after you are done writing the object to the file. If they are no longer references by the session, the garbage collector can free the memory and you won't run out of memory anymore.
I propose more than a sample code, but a query template based on Hibernate to do this workaround for you (pagination, scrolling and clearing Hibernate session).
It can also easily be adapted to use an EntityManager.
I've used the Hibernate scroll functionality successfully before without it reading the entire result set in. Someone said that MySQL does not do true scroll cursors, but it claims to based on the JDBC dmd.supportsResultSetType(ResultSet.TYPE_SCROLL_INSENSITIVE) and searching around it seems like other people have used it. Make sure it's not caching the Person objects in the session - I've used it on SQL queries where there was no entity to cache. You can call evict at the end of the loop to be sure or test with a sql query. Also play around with setFetchSize to optimize the number of trips to the server.
recently i worked over a problem like this, and i wrote a blog about how face that problem. is very like, i hope be helpfull for any one.
i use lazy list approach with partial adquisition. i Replaced the limit and offset or the pagination of query to a manual pagination.
In my example, the select returns 10 millions of records, i get them and insert them in a "temporal table":
create or replace function load_records ()
returns VOID as $$
BEGIN
drop sequence if exists temp_seq;
create temp sequence temp_seq;
insert into tmp_table
SELECT linea.*
FROM
(
select nextval('temp_seq') as ROWNUM,* from table1 t1
join table2 t2 on (t2.fieldpk = t1.fieldpk)
join table3 t3 on (t3.fieldpk = t2.fieldpk)
) linea;
END;
$$ language plpgsql;
after that, i can paginate without count each row but using the sequence assigned:
select * from tmp_table where counterrow >= 9000000 and counterrow <= 9025000
From java perspective, i implemented this pagination through partial adquisition with a lazy list. this is, a list that extends from Abstract list and implements get() method. The get method can use a data access interface to continue get next set of data and release the memory heap:
#Override
public E get(int index) {
if (bufferParcial.size() <= (index - lastIndexRoulette))
{
lastIndexRoulette = index;
bufferParcial.removeAll(bufferParcial);
bufferParcial = new ArrayList<E>();
bufferParcial.addAll(daoInterface.getBufferParcial());
if (bufferParcial.isEmpty())
{
return null;
}
}
return bufferParcial.get(index - lastIndexRoulette);<br>
}
by other hand, the data access interface use query to paginate and implements one method to iterate progressively, each 25000 records to complete it all.
results for this approach can be seen here
http://www.arquitecturaysoftware.co/2013/10/laboratorio-1-iterar-millones-de.html
Another option if you're "running out of RAM" is to just request say, one column instead of the entire object How to use hibernate criteria to return only one element of an object instead the entire object? (saves a lot of CPU process time to boot).
For me it worked properly when setting useCursors=true, otherwise The Scrollable Resultset ignores all the implementations of fetch size, in my case it was 5000 but Scrollable Resultset fetched millions of records at once causing excessive memory usage. underlying DB is MSSQLServer.
jdbc:jtds:sqlserver://localhost:1433/ACS;TDS=8.0;useCursors=true

Categories

Resources