I am currently working on a project that uses JPA (Toplink, currently) for its persistence. Currently, we are running a single application server, but, for redundancy, we would like to add a load balancer and another application sever (and possibly more as it grows).
First, I'm running into the issue of JPA caching. Since two processes will be updating the same database, the JPA cache returns the cached value rather than going to the database. I see how to turn that off, and the database itself implements a level of caching. Is turning off the cache completely the way to go here? I see the ways to tell JPA to always get from the database at a query level, but in a multi-server environment, it seems that you'll always want that to happen.
Along with this specific question, I'm interested in anyone out there who has implemented a JPA solution with multiple application servers and what problems arose during the implementation (and any suggestions you have).
Thanks much.
As you have found, you can disable the shared cache, see http://wiki.eclipse.org/EclipseLink/Examples/JPA/Caching or http://wiki.eclipse.org/EclipseLink/FAQ/How_to_disable_the_shared_cache%3F
There are also other options available in EclipseLink depending on your data and requirements.
A list of option include:
Disable shared cache
Enable cache coordination (see, http://www.eclipse.org/eclipselink/api/2.1/org/eclipse/persistence/config/PersistenceUnitProperties.html#COORDINATION_PROTOCOL)
Set a cache invalidation timeout (see, http://www.eclipse.org/eclipselink/api/2.1/org/eclipse/persistence/annotations/Cache.html#expiry%28%29)
Enable optimistic locking, this will ensure that any stale object cannot be updated, when an update on stale data occurs it will fail, and EclipseLink will automatically invalidate the object in the cache.
Investigate the Oracle TopLink integration of EclipseLink and Oracle Coherence to provide a distributed cache.
See also, http://en.wikibooks.org/wiki/Java_Persistence/Caching#Caching_in_a_Cluster
There is no perfect solution, the solution used normally depend on the data/class, normally an application has a set of read-only classes, read-mostly classes and write mostly classes. Personally I would enable the cache for the read-only with a 1 day timeout, enable the cache with cache coordination for the read-mostly, and disable the cache for the write mostly.
Related
I have a usual SpringBoot application, which executes tons of DB calls and for those I want to implement some Spring caching with normal #Cacheable / #CacheEvict and other annotations (by default I use CaffeineCache). There are several AKS nodes, on each of them one instance of my application is running. What I want to receive:
Local (in-memory) Spring cache. A distributed solution aka Redis-based or so is not suitable.
A cache should be invalidated for all running instances of the app after the update on one of them.
I have a global Kafka service, which registers every write/update request to my Cassandra DB
Now my question - is that possible to have a local, usual Spring cache with such an invalidation through Kafka resulting of course in synchronized cache version on all instances?
I would say it is possible in principle. You could build a naive solution, where
Read operations use #Cacheable
Write operations put a message to the Kafka bus, and each node has a listener that uses #CachePut to write it into the local cache.
But such a naive solution will not have any strict synchronisation guarantees, it is only eventually consistent. It takes time to propagate updates to the other nodes and in between other nodes could still read the old value. Also you would have to think about error conditions where an update could get lost.
If you want to have stricter guarantees, you need a multi-phase commit or a consensus protocol. Unless it is a research project I would highly discourage you from writing one yourself. These are not trivial because the problem is not trivial. Instead you should use existing implementations.
So in summary: If you don't need consistency, you could do it like you suggest. If you need any level of consistency guarantee, you should use an existing distributed cache, which can still be integrated with #Cacheable.
Suppose that we have a businessLogic() method that does 2 things: write some information in a local cache and save the same information in the DB using JDBC so that the contents of the cache and the DB are always the same.
I know we can use Spring's JDBC Datasource Transaction Manager to automatically rollback the DB in case of exception. However, how can we define a custom transaction manager that also rollbacks the content of the cache in this case, so that the contents of the cache and the DB are always in sync?
Thanks all.
Gab's answer is right, except for the parts that aren't.
XA is indeed the standard way to coordinate update of multiple resources... except that where the cache is local i.e. in-process, it's not necessarily a resource.
A cache doesn't exactly 'implement JTA', it acts in one of two roles in the XA protocol, according to how it's deployed. It can be an XAResource, but that's usually only required where its lifecycle is distinct from that of the client process. For in-process use, it's more likely to be a Synchronization.
The key difference between these roles is: XAResource is fault-tolerant, but Synchronization is not. For a volatile cache that's in-memory with the client process, it's sufficient to rebuild the cache after a crash by querying the db. For a cache that's out of process, a client crash after the db tx commit but before the cache update would leave the cache out of sync, at least until it expired or was manually refreshed.
Depending on the cache implementation, there is no guarantee it will pick the right mode automatically. See the configuration reference for your chosen implementation e.g. https://infinispan.org/docs/stable/user_guide/user_guide.html#tx_sync_enlist
Spring isn't actually a JTA XA transaction manager either, though it does provide an abstraction layer over them. It's possible to use Spring to drive a database in non-XA mode, but then you have no standard hook for the cache Synchronizations and you need a proprietary interface instead. Or you can have the database do pseudo-XA via a one-phase resource adapter. Full-on 2PC is probably overkill for your use case.
First of all I believe that the task of transaction management for cache is redundant. I advice you to only update the cache if database level transaction is successfully committed.
Most scenarios with cache using are completely acceptable if you have small window between updates of entity in database and its cached state.
If your case rejects any possibility of outdated cache then you probably have to avoid using cache or use something special for caching, probably the same database as your original data supporting transactions. Otherwise you will have problems trying to maintain consistency between two different systems: db level and cache level. Most of the time the best you can achieve is eventual consistency - it means that anyway you will have windows of inconsistent state and only then (eventually) the data will become consistent.
Standard way to deal with transaction distributed among multiple resources is to use XA
You must then access your database using an xa-datasource and use a cache implementation implementing JTA, eg. ehcache.
I'am not very familiar with spring boot, but the transaction manager should manage the transaction synchronization across both resources out of the box with the appropriate configuration (no need to override anything)
I use technologies like spring boot, jpa and java 8. I have a question, how can I check if the cache is empty and I should send a query to the database to reload it (how to check that I need to reload the cache)?
As your question is not clear about regarding what type of cache you are using ??
JPA uses the first level of caching is the persistence context.
The Entity Manager guarantees that within a single Persistence Context, for any particular database row, there will be only one object instance. However the same entity could be managed in another User's transaction, so you should use either optimistic or pessimistic locking.
If you mean 2nd level cache ,This level of cache came due to performance reasons.this 2nd level cache sits between Entity Manager and the database. Persistence context shares the cache, making the second level cache available throughout the application. Database traffic is reduced considerably because entities are loaded in to the shared cache and made available from there. So actually laying you dont need to worry about reloading of data from database if cache miss happens.
Now if you are implementing your own logic to implement the cache , then you need to do more research on how actually caching works and different algorithms for caching like LRU,MRU etc. (which I would personally not recommend as you can use existing available providers like ehcache, redis,hazelcast just few names for 2nd level caching )
We are looking at implementing a caching framework for our application to help improve the overall performance of our site.
We are running on Websphere 6.1, Spring, Hibernate and Oracle. The focus currently is on the static data or data that changes very little through out the day but is used a lot.
So any info would be great. I have started my google search and reading but wanted to get a feel for what has worked for members of the community. One of the things that we are interested in is the ability to have the system invalidate the cache when a change does happen in the underlining data.
Thanks
Update:
I came across an article on the IBM web site where it says that Hibernate's Cluster aware caches in conjunction with WebSphere Application Server has not been determined yet; therefore, is is not yet determined whether or not their use is supported.
Thoughts on that? We are running in a clustered environment.
Well the hibernate cache system does just that, I used ehCache effectively and easily with Hibernate (and the second level cache system).
Lucene could be an option too depending on the situation. Hibernate Search or Compass could help with that (although it might take some major work).
Replication using Terracotta could also be an option although I've never done it.
Hibernate has first and second level caching built in. You can configure it to use EhCache, OSCache, SwarmCache, or C3P0 - your choice.
You can also leverage WebSphere's default cache i.e. DynaCache to implement the second level cache. This will allow you to administer, monitor and configure your cache leveraging WebSphere caching infrastructure
I've used ehCache and OSCache and found OSCache to be easier to configure and use.
One of the things that we are
interested in is the ability to have
the system invalidate the cache when a
change does happen in the underlining
data.
From what I can see, Hibernate doesn't actually do the above - from Hibernate's docs:
Be careful. Caches are never aware of
changes made to the persistent store
by another application (though they
may be configured to regularly expire
cached data).
Obviously what it means is that a cache doesn't have ESP, and can't tell if an app not in the cluster has called straight DML on the database - but I am guessing that what you want is an ability to expose a service for legacy apps to hook in and invalidate the cache when they do update that data. And there isn't, to my knowledge, a suggestion about how this might be done.
Our design has one jvm that is a jboss/webapp (read/write) that is used to maintain the data via hibernate (using jpa) to the db. The model has 10-15 persistent classes with 3-5 levels of depth in the relationships.
We then have a separate jvm that is the server using this data. As it is running continuously we just have one long db session (read only).
There is currently no intra-jvm cache involved - so we manually signal one jvm from the other.
Now when the webapp changes some data, it signals the server to reload the changed data. What we have found is that we need to tell hibernate to purge the data and then reload it. Just doing a fetch/merge with the db does not do the job - mainly in respect of the objects several layers down the hierarchy.
Any thoughts on whether there is anything fundamentally wrong with this design or if anyone is doing this and has had better luck with working with hibernate on the reloads.
Thanks,
Chris
A Hibernate session loads all data it reads from the DB into what they call the first-level cache. Once a row is loaded from the DB, any subsequent fetches for a row with the same PK will return the data from this cache. Furthermore, Hibernate gaurentees reference equality for objects with the same PK in a single Session.
From what I understand, your read-only server application never closes its Hibernate session. So when the DB gets updated by the read-write application, the Session on read-only server is unaware of the change. Effectively, your read-only application is loading an in-memory copy of the database and using that copy, which gets stale in due course.
The simplest and best course of action I can suggest is to close and open Sessions as needed. This sidesteps the whole problem. Hibernate Sessions are intended to be a window for a short-lived interaction with the DB. I agree that there is a performance gain by not reloading the object-graph again and again; but you need to measure it and convince yourself that it is worth the pains.
Another option is to close and reopen the Session periodically. This ensures that the read-only application works with data not older than a given time interval. But there definitely is a window where the read-only application works with stale data (although the design guarantees that it gets the up-to-date data eventually). This might be permissible in many applications - you need to evaluate your situation.
The third option is to use a second level cache implementation, and use short-lived Sessions. There are various caching packages that work with Hibernate with relative merits and demerits.
Chris, I'm a little confused about your circumstances. If I understand correctly, you have a both a web app (read/write) a standalone application (read-only?) using Hibernate to access a shared database. The changes you make with the web app aren't visible to the standalone app. Is that right?
If so, have you considered using a different second-level cache implementation? I'm wondering if you might be able to use a clustered cache that is shared by both the web application and the standalone application. I believe that SwarmCache, which is integrated with Hibernate, will allow this, but I haven't tried it myself.
In general, though, you should know that the contents of a given cache will never be aware of activity by another application (that's why I suggest having both apps share a cache). Good luck!
From my point of view, you should change your underline Hibernate cache to that one, which supports clustered mode. It could be a JBoss Cache or a Swarm Cache. The first one has a better support of data synchronization (replication and invalidation) and also supports JTA.
Then you will able to configure cache synchronization between webapp and server. Also look at isolation level if you will use JBoss Cache. I believe you should use READ_COMMITTED mode if you want to get new data on a server from the same session.
The most used practice is to have a Container-Managed Entity Manager so that two or more applications in the same container (ie Glassfish, Tomcat, Websphere) can share the same caches.
But if you don't use an Application container, because you use Play! for instance, then I would build some webservices in the primary Application to read/write consistently in the cache.
I think using stale data is an open door for disaster. Just like Singletons become Multitons, read-only applications are often a write sometimes.
Belt and braces :)