I'm a newbie to JPA/Hibernate first level caching.
I have the following repository class
Each time I call the findByState method(within the same transaction), I see the hibernate sql query being output onto the console
public interface PersonRepository extends JpaRepository<PersonEntity, id> {
#Query("select person from PersonEntity p where name= (?1)")
List<PersonEntity> findByState(String state);
....
}
I expected the results to be cached by the first level cache and the database not be repeatedly queried.
What am I doing wrong?
There is often a misunderstanding about caching.
Hibernate does not cache queries and query results by default. The only thing the first level cache is used is when you call EntityManger.find() you will not see a SQL query executing. And the cache is used to avoid object creation if the entity is already loading.
What you are looking for is called "query cache".
This can be enabled by setting hibernate.cache.use_query_cache=true
Please read more about this topic in the official documentation:
https://docs.jboss.org/hibernate/orm/5.4/userguide/html_single/Hibernate_User_Guide.html#caching-query
The query will always go to the database. The first level cache will only contain the constructed entities. Its purpose is to ensure that the same db id is mapped to the same entity object (within a session)
Its also possible to use a query cache. You have to enable per query. Check the docs https://docs.jboss.org/hibernate/core/4.0/devguide/en-US/html/ch06.html
Related
I would like to stream results from PostgreSQL 11.2 and not read all results to memory at once. I use the newest stable SpringBoot 2.1.4.RELEASE.
I read the article how to do it in MySQL.
http://knes1.github.io/blog/2015/2015-10-19-streaming-mysql-results-using-java8-streams-and-spring-data.html
I also read article how to do it in PostgreSQL:
Java 8 JPA Repository Stream row-by-row in Postgresql
I have repository like that:
public interface ProductRepository extends JpaRepository<Product, UUID> {
#Query("SELECT p from Product p")
#QueryHints(value = #QueryHint(name = HINT_FETCH_SIZE, value = "50"))
Stream<Product> streamAll();
}
Than I use the stream that way:
productRepository.streamAll().forEach(product -> export(product));
To make the example easier, 'export' method is completely empty.
When I call the method I see Hibernate query
Hibernate: select product0_.id as id1_0_, product0_.created as created2_0_, product0_.description as descript3_0_, product0_.name as name4_0_, product0_.product_type_id as product_5_0_ from products product0_ order by product0_.id
and after some time I have OutOfMemoryError.
The query hint didn't help.
How to read data using Spring Boot repository (or even EntityManager) and load rows from DB in optimal way.
I know that I can make pagination, but as in articles was written, it is not the most optimal way.
You must detach the entity after your work finishes.
import javax.persistence.EntityManager;
...
#Autowired
private EntityManager entityManager;
...
// Your business logic
productRepository.streamAll().forEach(product -> {
export(product);
// must detach so that garbage collector can reclaim the memory.
entityManager.detach(product);
});
At the moment using spring all the data are retrieved and the Stream is applied only to data already in memory.
If you look at the source of org.springframework.data.jpa.provider.PersistenceProvider it seems that it uses a ScrollableResults to stream over the data.
Generally a ScrollableResults retrieve all data in memory.
You can find an interesting complete analysis using a MySql database here, but probably the same works for a Postgres database.
So also if you think to use a solution that doesn't need to use a lot memory in reality it does because the underlying implementation is not using an optimal implementation.
I faced exactly the same problem, and after long debugging of internals of spring data and hibernate have found solution which worked for me.
So to fetch data using the cursor in PostgreSQL it should be method with Stream result + annotation (kotlin syntax):
#QueryHints(QueryHint(name = org.hibernate.annotations.QueryHints.FETCH_SIZE, value = "50"))
which value it should be 50 or some else - it's not so important.
Probably you put the wrong name of the hint.
It is clear that, by default, when we execute entityManager.find(Post.class, 1L); then the Post instance for id=1 will be retrieved and placed in 1st level cache and then if we again execute the same entityManager.find(Post.class, 1L); in the same transaction, the post instance will be returned directly from 1st level cache without querying the database.
My question is the following:
Is the 1st level cache checked only when we try to get the entity by id
executing entityManager.find(...); method ? I mean what if we fetch the same Post instance(having id = 1) by different property other than ID, e.g. using Criteria query fetching the post by name ? Is it still going to check the 1st level cache ?
What about querying the same row by native query, jpql or spring data query method ? Does JPA/Hibernate parse native query and jpql queries in order to find out if there is a corresponding entity in 1st level cache ?
for example I have a method in my CRUD interface which deletes a user from the database:
public interface CrudUserRepository extends JpaRepository<User, Integer> {
#Transactional
#Modifying
#Query("DELETE FROM User u WHERE u.id=:id")
int delete(#Param("id") int id, #Param("userId") int userId);
}
This method will work only with the annotation #Modifying. But what is the need for the annotation here? Why cant spring analyze the query and understand that it is a modifying query?
CAUTION!
Using #Modifying(clearAutomatically=true) will drop any pending updates on the managed entities in the persistence context spring states the following :
Doing so triggers the query annotated to the method as an updating
query instead of selecting one. As the EntityManager might contain
outdated entities after the execution of the modifying query, we do
not automatically clear it (see the JavaDoc of EntityManager.clear()
for details), since this effectively drops all non-flushed changes
still pending in the EntityManager. If you wish the EntityManager to
be cleared automatically, you can set the #Modifying annotation’s
clearAutomatically attribute to true.
Fortunately, starting from Spring Boot 2.0.4.RELEASE Spring Data added flushAutomatically flag (https://jira.spring.io/browse/DATAJPA-806) to auto flush any managed entities on the persistence context before executing the modifying query check reference https://docs.spring.io/spring-data/jpa/docs/2.0.4.RELEASE/api/org/springframework/data/jpa/repository/Modifying.html#flushAutomatically
So the safest way to use #Modifying is :
#Modifying(clearAutomatically=true, flushAutomatically=true)
What happens if we don't use those two flags??
Consider the following code :
repo {
#Modifying
#Query("delete User u where u.active=0")
public void deleteInActiveUsers();
}
Scenario 1 why flushAutomatically
service {
User johnUser = userRepo.findById(1); // store in first level cache
johnUser.setActive(false);
repo.save(johnUser);
repo.deleteInActiveUsers();// BAM it won't delete JOHN right away
// JOHN still exist since john with active being false was not
// flushed into the database when #Modifying kicks in
// so imagine if after `deleteInActiveUsers` line you called a native
// query or started a new transaction, both cases john
// was not deleted so it can lead to faulty business logic
}
Scenario 2 why clearAutomatically
In following consider johnUser.active is false already
service {
User johnUser = userRepo.findById(1); // store in first level cache
repo.deleteInActiveUsers(); // you think that john is deleted now
System.out.println(userRepo.findById(1).isPresent()) // TRUE!!!
System.out.println(userRepo.count()) // 1 !!!
// JOHN still exists since in this transaction persistence context
// John's object was not cleared upon #Modifying query execution,
// John's object will still be fetched from 1st level cache
// `clearAutomatically` takes care of doing the
// clear part on the objects being modified for current
// transaction persistence context
}
So if - in the same transaction - you are playing with modified objects before or after the line which does #Modifying, then use clearAutomatically & flushAutomatically if not then you can skip using these flags
BTW this is another reason why you should always put the #Transactional annotation on service layer, so that you only can have one persistence context for all your managed entities in the same transaction.
Since persistence context is bounded to hibernate session, you need to know that a session can contain couple of transactions see this answer for more info https://stackoverflow.com/a/5409180/1460591
The way spring data works is that it joins the transactions together (known as Transaction Propagation) into one transaction (default propagation (REQUIRED)) see this answer for more info https://stackoverflow.com/a/25710391/1460591
To connect things together if you have multiple isolated transactions (e.g not having a transactional annotation on the service) hence you would have multiple sessions following the way spring data works hence you have multiple persistence contexts (aka 1st level cache) that means you might delete/modify an entity in a persistence context even with using flushAutomatically the same deleted/modified entity might be fetched and cached in another transaction's persistence context already, That would cause wrong business decisions due to wrong or un-synced data.
This will trigger the query annotated to the method as updating query instead of a selecting one. As the EntityManager might contain outdated entities after the execution of the modifying query, we automatically clear it (see JavaDoc of EntityManager.clear() for details). This will effectively drop all non-flushed changes still pending in the EntityManager. If you don't wish the EntityManager to be cleared automatically you can set #Modifying annotation's clearAutomatically attribute to false;
for further detail you can follow this link:-
http://docs.spring.io/spring-data/jpa/docs/1.3.4.RELEASE/reference/html/jpa.repositories.html
Queries that require a #Modifying annotation include INSERT, UPDATE, DELETE, and DDL statements.
Adding #Modifying annotation indicates the query is not for a SELECT query.
When you use only #Query annotation,you should use select queries
However you #Modifying annotation you can use insert,delete,update queries above the method.
This question already has answers here:
Spring Data JPA Update #Query not updating?
(5 answers)
Closed 2 years ago.
The community reviewed whether to reopen this question 1 year ago and left it closed:
Original close reason(s) were not resolved
Let's suppose to have this situation:
We have Spring Data configured in the standard way, there is a Respository object, an Entity object and all works well.
Now for some complex motivations I have to use EntityManager (or JdbcTemplate, whatever is at a lower level than Spring Data) directly to update the table associated to my Entity, with a native SQL query. So, I'm not using Entity object, but simply doing a database update manually on the table I use as entity (it's more correct to say the table from which I get values, see next rows).
The reason is that I had to bind my spring-data Entity to a MySQL view that makes UNION of multiple tables, not directly to the table I need to update.
What happens is:
In a functional test, I call the "manual" update method (on table from which the MySQL view is created) as previously described (through entity-manager) and if I make a simple Respository.findOne(objectId), I get the old object (not updated one). I have to call Entitymanager.refresh(object) to get the updated object.
Why?
Is there a way to "synchronize" (out of the box) objects (or force some refresh) in spring-data? Or am I asking for a miracle?
I'm not ironical, but maybe I'm not so expert, maybe (or probably) is my ignorance. If so please explain me why and (if you want) share some advanced knowledge about this amazing framework.
If I make a simple Respository.findOne(objectId) I get old object (not
updated one). I've to call Entitymanager.refresh(object) to get
updated object.
Why?
The first-level cache is active for the duration of a session. Any object entity previously retrieved in the context of a session will be retrieved from the first-level cache unless there is reason to go back to the database.
Is there a reason to go back to the database after your SQL update? Well, as the book Pro JPA 2 notes (p199) regarding bulk update statements (either via JPQL or SQL):
The first issue for developers to consider when using these [bulk update] statements
is that the persistence context is not updated to reflect the results
of the operation. Bulk operations are issued as SQL against the
database, bypassing the in-memory structures of the persistence
context.
which is what you are seeing. That is why you need to call refresh to force the entity to be reloaded from the database as the persistence context is not aware of any potential modifications.
The book also notes the following about using Native SQL statements (rather than JPQL bulk update):
â– CAUTION Native SQL update and delete operations should not be
executed on tables mapped by an entity. The JP QL operations tell the
provider what cached entity state must be invalidated in order to
remain consistent with the database. Native SQL operations bypass such
checks and can quickly lead to situations where the inmemory cache is
out of date with respect to the database.
Essentially then, should you have a 2nd level cache configured then updating any entity currently in the cache via a native SQL statement is likely to result in stale data in the cache.
In Spring Boot JpaRepository:
If our modifying query changes entities contained in the persistence context, then this context becomes outdated.
In order to fetch the entities from the database with latest record.
Use #Modifying(clearAutomatically = true)
#Modifying annotation has clearAutomatically attribute which defines whether it should clear the underlying persistence context after executing the modifying query.
Example:
#Modifying(clearAutomatically = true)
#Query("UPDATE NetworkEntity n SET n.network_status = :network_status WHERE n.network_id = :network_id")
int expireNetwork(#Param("network_id") Integer network_id, #Param("network_status") String network_status);
Based on the way you described your usage, fetching from the repo should retrieve the updated object without the need to refresh the object as long as the method which used the entity manager to merge has #transactional
here's a sample test
#DirtiesContext(classMode = ClassMode.AFTER_CLASS)
#RunWith(SpringJUnit4ClassRunner.class)
#ContextConfiguration(classes = ApplicationConfig.class)
#EnableJpaRepositories(basePackages = "com.foo")
public class SampleSegmentTest {
#Resource
SampleJpaRepository segmentJpaRepository;
#PersistenceContext
private EntityManager entityManager;
#Transactional
#Test
public void test() {
Segment segment = new Segment();
ReflectionTestUtils.setField(segment, "value", "foo");
ReflectionTestUtils.setField(segment, "description", "bar");
segmentJpaRepository.save(segment);
assertNotNull(segment.getId());
assertEquals("foo", segment.getValue());
assertEquals("bar",segment.getDescription());
ReflectionTestUtils.setField(segment, "value", "foo2");
entityManager.merge(segment);
Segment updatedSegment = segmentJpaRepository.findOne(segment.getId());
assertEquals("foo2", updatedSegment.getValue());
}
}
I have a PostgreSQL 8.4 database with some tables and views which are essentially joins on some of the tables. I used NetBeans 7.2 (as described here) to create REST based services derived from those views and tables and deployed those to a Glassfish 3.1.2.2 server.
There is another process which asynchronously updates contents in some of tables used to build the views. I can directly query the views and tables and see these changes have occured correctly. However, when pulled from the REST based services, the values are not the same as those in the database. I am assuming this is because JPA has cached local copies of the database contents on the Glassfish server and JPA needs to refresh the associated entities.
I have tried adding a couple of methods to the AbstractFacade class NetBeans generates:
public abstract class AbstractFacade<T> {
private Class<T> entityClass;
private String entityName;
private static boolean _refresh = true;
public static void refresh() { _refresh = true; }
public AbstractFacade(Class<T> entityClass) {
this.entityClass = entityClass;
this.entityName = entityClass.getSimpleName();
}
private void doRefresh() {
if (_refresh) {
EntityManager em = getEntityManager();
em.flush();
for (EntityType<?> entity : em.getMetamodel().getEntities()) {
if (entity.getName().contains(entityName)) {
try {
em.refresh(entity);
// log success
}
catch (IllegalArgumentException e) {
// log failure ... typically complains entity is not managed
}
}
}
_refresh = false;
}
}
...
}
I then call doRefresh() from each of the find methods NetBeans generates. What normally happens is the IllegalArgumentsException is thrown stating somethng like Can not refresh not managed object: EntityTypeImpl#28524907:MyView [ javaType: class org.my.rest.MyView descriptor: RelationalDescriptor(org.my.rest.MyView --> [DatabaseTable(my_view)]), mappings: 12].
So I'm looking for some suggestions on how to correctly refresh the entities associated with the views so it is up to date.
UPDATE: Turns out my understanding of the underlying problem was not correct. It is somewhat related to another question I posted earlier, namely the view had no single field which could be used as a unique identifier. NetBeans required I select an ID field, so I just chose one part of what should have been a multi-part key. This exhibited the behavior that all records with a particular ID field were identical, even though the database had records with the same ID field but the rest of it was different. JPA didn't go any further than looking at what I told it was the unique identifier and simply pulled the first record it found.
I resolved this by adding a unique identifier field (never was able to get the multipart key to work properly).
I recommend adding an #Startup #Singleton class that establishes a JDBC connection to the PostgreSQL database and uses LISTEN and NOTIFY to handle cache invalidation.
Update: Here's another interesting approach, using pgq and a collection of workers for invalidation.
Invalidation signalling
Add a trigger on the table that's being updated that sends a NOTIFY whenever an entity is updated. On PostgreSQL 9.0 and above this NOTIFY can contain a payload, usually a row ID, so you don't have to invalidate your entire cache, just the entity that has changed. On older versions where a payload isn't supported you can either add the invalidated entries to a timestamped log table that your helper class queries when it gets a NOTIFY, or just invalidate the whole cache.
Your helper class now LISTENs on the NOTIFY events the trigger sends. When it gets a NOTIFY event, it can invalidate individual cache entries (see below), or flush the entire cache. You can listen for notifications from the database with PgJDBC's listen/notify support. You will need to unwrap any connection pooler managed java.sql.Connection to get to the underlying PostgreSQL implementation so you can cast it to org.postgresql.PGConnection and call getNotifications() on it.
An an alternative to LISTEN and NOTIFY, you could poll a change log table on a timer, and have a trigger on the problem table append changed row IDs and change timestamps to the change log table. This approach will be portable except for the need for a different trigger for each DB type, but it's inefficient and less timely. It'll require frequent inefficient polling, and still have a time delay that the listen/notify approach does not. In PostgreSQL you can use an UNLOGGED table to reduce the costs of this approach a little bit.
Cache levels
EclipseLink/JPA has a couple of levels of caching.
The 1st level cache is at the EntityManager level. If an entity is attached to an EntityManager by persist(...), merge(...), find(...), etc, then the EntityManager is required to return the same instance of that entity when it is accessed again within the same session, whether or not your application still has references to it. This attached instance won't be up-to-date if your database contents have since changed.
The 2nd level cache, which is optional, is at the EntityManagerFactory level and is a more traditional cache. It isn't clear whether you have the 2nd level cache enabled. Check your EclipseLink logs and your persistence.xml. You can get access to the 2nd level cache with EntityManagerFactory.getCache(); see Cache.
#thedayofcondor showed how to flush the 2nd level cache with:
em.getEntityManagerFactory().getCache().evictAll();
but you can also evict individual objects with the evict(java.lang.Class cls, java.lang.Object primaryKey) call:
em.getEntityManagerFactory().getCache().evict(theClass, thePrimaryKey);
which you can use from your #Startup #Singleton NOTIFY listener to invalidate only those entries that have changed.
The 1st level cache isn't so easy, because it's part of your application logic. You'll want to learn about how the EntityManager, attached and detached entities, etc work. One option is to always use detached entities for the table in question, where you use a new EntityManager whenever you fetch the entity. This question:
Invalidating JPA EntityManager session
has a useful discussion of handling invalidation of the entity manager's cache. However, it's unlikely that an EntityManager cache is your problem, because a RESTful web service is usually implemented using short EntityManager sessions. This is only likely to be an issue if you're using extended persistence contexts, or if you're creating and managing your own EntityManager sessions rather than using container-managed persistence.
You can either disable caching entirely (see: http://wiki.eclipse.org/EclipseLink/FAQ/How_to_disable_the_shared_cache%3F ) but be preparedto a fairly large performance loss.
Otherwise, you can perform a clear cache programmatically with
em.getEntityManagerFactory().getCache().evictAll();
You can map it to a servlet so you can call it externally - this is better if your database is modify externally very seldom and you just want to be sure JPS will pick up the new version
Just a thought, but how do you receive your EntityManager/Session/whatever?
If you queried the entity in one session, it will be detached in the next one and you will have to merge it back into the persistence context to get it managed again.
Trying to work with detached entities may result in those not-managed exceptions, you should re-query the entity or you could try it with merge (or similar methods).
JPA doesn't do any caching by default. You have to explicitly configure it. I believe its the side effect of the architectural style you have chosen: REST. I think caching is happening at the web servers, proxy servers etc. I suggest you read this and debug more.