What I want is, two different cache implementations (let's say Redis and EhCache) on one method. Meaning #Cacheable method should cache both Redis and EhCache.
Is it even possible?
Option 1:
Stack the caches. Configure the local cache in Spring. Then wire in the distributed cache via a CacheLoader/CacheWriter. Consistency needs to be carefully evaluated. E.g. if an update goes to the distributed cache, how do you invalidate the local caches? That is not so trivial. Maybe it is easy and not needed for your data, maybe it is near to impossible.
Option 2:
Go with a distributed cache which provides a so called near cache. That is actually the combination you want to do by yourself, but combined in a product. I know that hazelcast and Infinispan offer a near cache. However, your mileage may vary regarding consistency and resilience. I know that hazelcast just recently enhanced the near cache, so it is consistent.
Interesting use case and actually a common problem. Further thoughts and discussion highly appreciated.
Related
I have a usual SpringBoot application, which executes tons of DB calls and for those I want to implement some Spring caching with normal #Cacheable / #CacheEvict and other annotations (by default I use CaffeineCache). There are several AKS nodes, on each of them one instance of my application is running. What I want to receive:
Local (in-memory) Spring cache. A distributed solution aka Redis-based or so is not suitable.
A cache should be invalidated for all running instances of the app after the update on one of them.
I have a global Kafka service, which registers every write/update request to my Cassandra DB
Now my question - is that possible to have a local, usual Spring cache with such an invalidation through Kafka resulting of course in synchronized cache version on all instances?
I would say it is possible in principle. You could build a naive solution, where
Read operations use #Cacheable
Write operations put a message to the Kafka bus, and each node has a listener that uses #CachePut to write it into the local cache.
But such a naive solution will not have any strict synchronisation guarantees, it is only eventually consistent. It takes time to propagate updates to the other nodes and in between other nodes could still read the old value. Also you would have to think about error conditions where an update could get lost.
If you want to have stricter guarantees, you need a multi-phase commit or a consensus protocol. Unless it is a research project I would highly discourage you from writing one yourself. These are not trivial because the problem is not trivial. Instead you should use existing implementations.
So in summary: If you don't need consistency, you could do it like you suggest. If you need any level of consistency guarantee, you should use an existing distributed cache, which can still be integrated with #Cacheable.
I am evaluating Apache Ignite to check if it fits our company's need. So far so good. Now I am trying to understand how the near cache feature works in terms of consistency.
We currently have several micro-services with one Ignite configured in client mode in each. All these instances are connected to several Ignite servers in a cluster. For some use cases (reads>>>writes) it seems reasonable to use a near cache in front of the cache servers. I have checked and it seems to automatically invalidate "stale data" in all instances in case of the write, which is good.
My question: is there any documentation besides this one that explains how it works? In particular, I would like to understand if any subsequent read requests (after the write one) to any other instances will get the updated data (no eventual consistency).
Thanks!
With FULL_SYNC mode all copies are always consistent, no eventual consistency. Near cache functions as a sort of additional backup copy.
I don't think there is a design document on how it works though.
Background]
- There are two java applications (A and B), and they can only communicate via Oracle DB
- A and B share the same database table
- A and B stores the data in cache
Problem]
If A performs simple transaction (insert/update/delete), the cache in A is updated. Also, the cache in B should be updated automatically!
Current Status]
Two solutions I found and tried
- Solution1) Using DatabaseChangeListener
- Solution2) Using Socket Programming
Question]
The solution will be used for company, and I would like to know if there is anything that I can improve my solutions.
1) What could be disadvantages if I use DatabaseChangeListener?
2) What could be disadvantages if I use socket programming? (Maybe it's too low-level that developer cannot maintain due to company policy?)
3) I heard there are 3rd party cache that also supports synchronization. Am I correct?
Please let me know if you need more information!
Thank you very much in advance!
[EDIT]
If would be much appreciated if you can leave a comment when you down-vote this. I would like to know how I can improve this question with your feedback! Thank you
Your question appears every now and then with slightly different aspects. One useful answer to that is here: Guava Cache, how to block access while doing removal
About using the DatabaseChangeListener:
Although you are fine with oracle, I would discourage the use of vendor specific interfaces. For me, it would be okay to use, if it is an performance optimization, but I would never use vendor specific interfaces for basic functionality.
Second, the usage of the change listener may still lead to dirty reads.
About "distributed caches" as veritas suggested:
There is a difference between distributed caches and clustered caches. Distributed caches spread (aka distribute) the cached data on different nodes, clustered caches are caches for clustered applications that keep track of data consistency within the cluster. A distributed cache usually is a clustered cache, but not the other way around. For a general idea on the topic I recommend the infinispan documentation on clustering as an intro: http://infinispan.org/docs/7.0.x/user_guide/user_guide.html#_clustering
Wrap up:
A clustered cache implementation is the thing you need. However, if you want data consistency, you still need to carefully design your transaction handling.
You can, of course, also do socket communication yourself and send simple object invalidate messages to the other applications. The challenging part is the error handling. When was the invalidate successful? Is there a timeout for the other nodes to acknowledge? When to drop a node and maintain a cluster state at all?
I will suggest for the 3rd Party Cache, if you have many similar use cases or many tables need to be updated .
Please read about terracotta Distributed Cache.
It gives exactly what you want.
You can also look for hazelcast or memcached
I am currently working on a project that uses JPA (Toplink, currently) for its persistence. Currently, we are running a single application server, but, for redundancy, we would like to add a load balancer and another application sever (and possibly more as it grows).
First, I'm running into the issue of JPA caching. Since two processes will be updating the same database, the JPA cache returns the cached value rather than going to the database. I see how to turn that off, and the database itself implements a level of caching. Is turning off the cache completely the way to go here? I see the ways to tell JPA to always get from the database at a query level, but in a multi-server environment, it seems that you'll always want that to happen.
Along with this specific question, I'm interested in anyone out there who has implemented a JPA solution with multiple application servers and what problems arose during the implementation (and any suggestions you have).
Thanks much.
As you have found, you can disable the shared cache, see http://wiki.eclipse.org/EclipseLink/Examples/JPA/Caching or http://wiki.eclipse.org/EclipseLink/FAQ/How_to_disable_the_shared_cache%3F
There are also other options available in EclipseLink depending on your data and requirements.
A list of option include:
Disable shared cache
Enable cache coordination (see, http://www.eclipse.org/eclipselink/api/2.1/org/eclipse/persistence/config/PersistenceUnitProperties.html#COORDINATION_PROTOCOL)
Set a cache invalidation timeout (see, http://www.eclipse.org/eclipselink/api/2.1/org/eclipse/persistence/annotations/Cache.html#expiry%28%29)
Enable optimistic locking, this will ensure that any stale object cannot be updated, when an update on stale data occurs it will fail, and EclipseLink will automatically invalidate the object in the cache.
Investigate the Oracle TopLink integration of EclipseLink and Oracle Coherence to provide a distributed cache.
See also, http://en.wikibooks.org/wiki/Java_Persistence/Caching#Caching_in_a_Cluster
There is no perfect solution, the solution used normally depend on the data/class, normally an application has a set of read-only classes, read-mostly classes and write mostly classes. Personally I would enable the cache for the read-only with a 1 day timeout, enable the cache with cache coordination for the read-mostly, and disable the cache for the write mostly.
We are looking at implementing a caching framework for our application to help improve the overall performance of our site.
We are running on Websphere 6.1, Spring, Hibernate and Oracle. The focus currently is on the static data or data that changes very little through out the day but is used a lot.
So any info would be great. I have started my google search and reading but wanted to get a feel for what has worked for members of the community. One of the things that we are interested in is the ability to have the system invalidate the cache when a change does happen in the underlining data.
Thanks
Update:
I came across an article on the IBM web site where it says that Hibernate's Cluster aware caches in conjunction with WebSphere Application Server has not been determined yet; therefore, is is not yet determined whether or not their use is supported.
Thoughts on that? We are running in a clustered environment.
Well the hibernate cache system does just that, I used ehCache effectively and easily with Hibernate (and the second level cache system).
Lucene could be an option too depending on the situation. Hibernate Search or Compass could help with that (although it might take some major work).
Replication using Terracotta could also be an option although I've never done it.
Hibernate has first and second level caching built in. You can configure it to use EhCache, OSCache, SwarmCache, or C3P0 - your choice.
You can also leverage WebSphere's default cache i.e. DynaCache to implement the second level cache. This will allow you to administer, monitor and configure your cache leveraging WebSphere caching infrastructure
I've used ehCache and OSCache and found OSCache to be easier to configure and use.
One of the things that we are
interested in is the ability to have
the system invalidate the cache when a
change does happen in the underlining
data.
From what I can see, Hibernate doesn't actually do the above - from Hibernate's docs:
Be careful. Caches are never aware of
changes made to the persistent store
by another application (though they
may be configured to regularly expire
cached data).
Obviously what it means is that a cache doesn't have ESP, and can't tell if an app not in the cluster has called straight DML on the database - but I am guessing that what you want is an ability to expose a service for legacy apps to hook in and invalidate the cache when they do update that data. And there isn't, to my knowledge, a suggestion about how this might be done.