Spring local cache for multiple app instances with invalidation through Kafka

Spring local cache for multiple app instances with invalidation through Kafka - java

I have a usual SpringBoot application, which executes tons of DB calls and for those I want to implement some Spring caching with normal #Cacheable / #CacheEvict and other annotations (by default I use CaffeineCache). There are several AKS nodes, on each of them one instance of my application is running. What I want to receive:
Local (in-memory) Spring cache. A distributed solution aka Redis-based or so is not suitable.
A cache should be invalidated for all running instances of the app after the update on one of them.
I have a global Kafka service, which registers every write/update request to my Cassandra DB
Now my question - is that possible to have a local, usual Spring cache with such an invalidation through Kafka resulting of course in synchronized cache version on all instances?

I would say it is possible in principle. You could build a naive solution, where
Read operations use #Cacheable
Write operations put a message to the Kafka bus, and each node has a listener that uses #CachePut to write it into the local cache.
But such a naive solution will not have any strict synchronisation guarantees, it is only eventually consistent. It takes time to propagate updates to the other nodes and in between other nodes could still read the old value. Also you would have to think about error conditions where an update could get lost.
If you want to have stricter guarantees, you need a multi-phase commit or a consensus protocol. Unless it is a research project I would highly discourage you from writing one yourself. These are not trivial because the problem is not trivial. Instead you should use existing implementations.
So in summary: If you don't need consistency, you could do it like you suggest. If you need any level of consistency guarantee, you should use an existing distributed cache, which can still be integrated with #Cacheable.

Related

Is cache2k a persistent cache?

I have used the cache2k in my java project and it was so simple (key-value pair) and easy to use. Now I want to know is if cache2k is a persistent or non-persistent cache.
I found the answer in here
https://stackoverflow.com/a/23709996/12605243 which was said at 2014 stated that it was gonna be updated to persistent cache.
So my question is 'Am I using a persistent or non persistent cache?'. I have read their docs but unable to find it.

Basically its possible to add persitence via CacheLoader and CacheWriter. We use that in several ways to use file system or database as storage. When adding persistence this way the cache operates in the so called "cache through" mode. Some operations of the cache, especially get and put operate transparently and read or write the data via the loader and writer to the storage. Other operations, like CAS operations, just interact with the in-memory cache.
The persistence feature as it was planed was meant to be transparent for all cache operations. Although its feasible and the basic work is done in the internal infrastructure, we don't have a big need for it. Other features and tasks seem more important. However, I am happy to hear about potential use cases.

Is there a standard way of synchronising a Map of objects across a network?

In Java, I have a HashMap containing objects (which can be serializable, if it helps). Elsewhere on a network, I have another HashMap in another copy of the application that I would like to stay in sync with the first.
For example if on computer A, someone runs myMap.put("Hello", "World"); and on computer B, someone runs myMap.put("foo", "bar");, then after some time delay for changes to propagate, both computers would have mayMap.get("Hello") == "World" and mayMap.get("foo") == "bar".
Is this requirement met by an existing facility in the Java language, a library, or some other program? If this is already a "solved problem" it would be great not to have to write my own code for this.
If there are multiple ways of achieving this I would prefer, in priority order:
Changes are guaranteed to propagate 100% of the time (doesn't matter how long it takes)
Changes propagate rapidly
Changes propagate with minimal bandwidth use between computers.
(Note: I have had trouble searching for solutions as results are dominated by questions about synchronizing access to a Map from multiple threads in the same application. This is not what my question is about.)

You could look at the hazelcast in-memory database.
It's an open source solution designed for distributed architectures.
It maps really well to your problem since the hazelcast IMap extends java.util.Map.
Link: Hazelcast IMap

what you are trying to do is call clustering between two node
here i have some solution
you can achieve your requirement using serialization make your map
serializable read and write state of map in each interval of time
and sync it.this is core and basic way to achieve your
functionality.but by using serialization you have to manually manage
sync of map(i.e you have to do code for that)
Hazelcast open source distributed caching mechanism hazelcast
is best api and have reach libarary to achive cluster environment
and share data between different node
coherence web also provide mechanism to achieve clustering by
Oracle
Ehcache is a cache library introduced in 2003 to improve
performance by reducing the load on underlying resources. Ehcache is
not for both general-purpose caching and caching Hibernate
(second-level cache), data access objects, security credentials, and
web pages. It can also be used for SOAP and RESTful server caching,
application persistence, and distributed caching
among all of above Hazelcast is best api go through it will sure help you

JPA with Multiple Servers

I am currently working on a project that uses JPA (Toplink, currently) for its persistence. Currently, we are running a single application server, but, for redundancy, we would like to add a load balancer and another application sever (and possibly more as it grows).
First, I'm running into the issue of JPA caching. Since two processes will be updating the same database, the JPA cache returns the cached value rather than going to the database. I see how to turn that off, and the database itself implements a level of caching. Is turning off the cache completely the way to go here? I see the ways to tell JPA to always get from the database at a query level, but in a multi-server environment, it seems that you'll always want that to happen.
Along with this specific question, I'm interested in anyone out there who has implemented a JPA solution with multiple application servers and what problems arose during the implementation (and any suggestions you have).
Thanks much.

As you have found, you can disable the shared cache, see http://wiki.eclipse.org/EclipseLink/Examples/JPA/Caching or http://wiki.eclipse.org/EclipseLink/FAQ/How_to_disable_the_shared_cache%3F
There are also other options available in EclipseLink depending on your data and requirements.
A list of option include:
Disable shared cache
Enable cache coordination (see, http://www.eclipse.org/eclipselink/api/2.1/org/eclipse/persistence/config/PersistenceUnitProperties.html#COORDINATION_PROTOCOL)
Set a cache invalidation timeout (see, http://www.eclipse.org/eclipselink/api/2.1/org/eclipse/persistence/annotations/Cache.html#expiry%28%29)
Enable optimistic locking, this will ensure that any stale object cannot be updated, when an update on stale data occurs it will fail, and EclipseLink will automatically invalidate the object in the cache.
Investigate the Oracle TopLink integration of EclipseLink and Oracle Coherence to provide a distributed cache.
See also, http://en.wikibooks.org/wiki/Java_Persistence/Caching#Caching_in_a_Cluster
There is no perfect solution, the solution used normally depend on the data/class, normally an application has a set of read-only classes, read-mostly classes and write mostly classes. Personally I would enable the cache for the read-only with a 1 day timeout, enable the cache with cache coordination for the read-mostly, and disable the cache for the write mostly.

Update a single object across multiple process in java

A couple of Relational DB tables are managed by a single object cache that resides in a process. When the cache is committed the tables are updated. The DB relational tables are updated by regular SQL queries and not anything more fancier like hibernate.
Eventually, other processes got into the business of modifying this object without communicating with one another i.e, Each process would initialize this object (read from DB) and update it( commit to DB), & other process would not know about it holding on to a stale cache.
I have to fix this workflow. I have thought of couple of methods.
One is to make this object an mBean. So, the object would reside on one process and every process would eventually modify the object in that process by mBean method invocations.
However, this approach has a couple of problems.
1) Every object returned by this cache has be an mBean, which could make the method invocations quite chatty.
2) Also there is a requirement that every process should see a consistent data model(cache) of the DB, and it should merge its contents to the DB if possible. (like a transaction). If the DB was updated by some other process significantly, it is OK for the merge to fail.
What technologies in Java will help to solve this problem?

You should have a look at Terracotta. They have technology that makes multiple JVMs (can be on different servers) appear unified. If you update an object on one JVM, Terracotta will update the instance transparently on all JVMs in the cluster in a safe way.

If you wanted to keep the object model, you could use java object cache for centralized storage before committing. Or you could keep a shared lock using zookeeper.
But it sounds like you should really abandon the self-managed cache. Use hibernate or another JPA implementation, which you mentioned. JPA addresses the cache issues and maintains a L2 shared cache, so they've thought about this for you.

I agree with John - use a second level cache in hibernate with support for clustering. Much more straightforward way to manage data by using a simplified data access model and let Hibernate manage the details.
Terracotta Ehcache is one such cache, so is JBoss, Coherence, etc.
More info on Hibernate Second Level Cache can be had here and in the official Hibernate docs on Chapter 19. Improving Performance (note that the while the Hibernate docs do list second level cache providers, the list is woefully out of date, for example who uses Swarm Cache? The last release of that was in 2003)

How to maintain Hibernate cache consistency running two Java applications?

Our design has one jvm that is a jboss/webapp (read/write) that is used to maintain the data via hibernate (using jpa) to the db. The model has 10-15 persistent classes with 3-5 levels of depth in the relationships.
We then have a separate jvm that is the server using this data. As it is running continuously we just have one long db session (read only).
There is currently no intra-jvm cache involved - so we manually signal one jvm from the other.
Now when the webapp changes some data, it signals the server to reload the changed data. What we have found is that we need to tell hibernate to purge the data and then reload it. Just doing a fetch/merge with the db does not do the job - mainly in respect of the objects several layers down the hierarchy.
Any thoughts on whether there is anything fundamentally wrong with this design or if anyone is doing this and has had better luck with working with hibernate on the reloads.
Thanks,
Chris

A Hibernate session loads all data it reads from the DB into what they call the first-level cache. Once a row is loaded from the DB, any subsequent fetches for a row with the same PK will return the data from this cache. Furthermore, Hibernate gaurentees reference equality for objects with the same PK in a single Session.
From what I understand, your read-only server application never closes its Hibernate session. So when the DB gets updated by the read-write application, the Session on read-only server is unaware of the change. Effectively, your read-only application is loading an in-memory copy of the database and using that copy, which gets stale in due course.
The simplest and best course of action I can suggest is to close and open Sessions as needed. This sidesteps the whole problem. Hibernate Sessions are intended to be a window for a short-lived interaction with the DB. I agree that there is a performance gain by not reloading the object-graph again and again; but you need to measure it and convince yourself that it is worth the pains.
Another option is to close and reopen the Session periodically. This ensures that the read-only application works with data not older than a given time interval. But there definitely is a window where the read-only application works with stale data (although the design guarantees that it gets the up-to-date data eventually). This might be permissible in many applications - you need to evaluate your situation.
The third option is to use a second level cache implementation, and use short-lived Sessions. There are various caching packages that work with Hibernate with relative merits and demerits.

Chris, I'm a little confused about your circumstances. If I understand correctly, you have a both a web app (read/write) a standalone application (read-only?) using Hibernate to access a shared database. The changes you make with the web app aren't visible to the standalone app. Is that right?
If so, have you considered using a different second-level cache implementation? I'm wondering if you might be able to use a clustered cache that is shared by both the web application and the standalone application. I believe that SwarmCache, which is integrated with Hibernate, will allow this, but I haven't tried it myself.
In general, though, you should know that the contents of a given cache will never be aware of activity by another application (that's why I suggest having both apps share a cache). Good luck!

From my point of view, you should change your underline Hibernate cache to that one, which supports clustered mode. It could be a JBoss Cache or a Swarm Cache. The first one has a better support of data synchronization (replication and invalidation) and also supports JTA.
Then you will able to configure cache synchronization between webapp and server. Also look at isolation level if you will use JBoss Cache. I believe you should use READ_COMMITTED mode if you want to get new data on a server from the same session.

The most used practice is to have a Container-Managed Entity Manager so that two or more applications in the same container (ie Glassfish, Tomcat, Websphere) can share the same caches.
But if you don't use an Application container, because you use Play! for instance, then I would build some webservices in the primary Application to read/write consistently in the cache.
I think using stale data is an open door for disaster. Just like Singletons become Multitons, read-only applications are often a write sometimes.
Belt and braces :)

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.