I have used the cache2k in my java project and it was so simple (key-value pair) and easy to use. Now I want to know is if cache2k is a persistent or non-persistent cache.
I found the answer in here
https://stackoverflow.com/a/23709996/12605243 which was said at 2014 stated that it was gonna be updated to persistent cache.
So my question is 'Am I using a persistent or non persistent cache?'. I have read their docs but unable to find it.
Basically its possible to add persitence via CacheLoader and CacheWriter. We use that in several ways to use file system or database as storage. When adding persistence this way the cache operates in the so called "cache through" mode. Some operations of the cache, especially get and put operate transparently and read or write the data via the loader and writer to the storage. Other operations, like CAS operations, just interact with the in-memory cache.
The persistence feature as it was planed was meant to be transparent for all cache operations. Although its feasible and the basic work is done in the internal infrastructure, we don't have a big need for it. Other features and tasks seem more important. However, I am happy to hear about potential use cases.
Related
I have a usual SpringBoot application, which executes tons of DB calls and for those I want to implement some Spring caching with normal #Cacheable / #CacheEvict and other annotations (by default I use CaffeineCache). There are several AKS nodes, on each of them one instance of my application is running. What I want to receive:
Local (in-memory) Spring cache. A distributed solution aka Redis-based or so is not suitable.
A cache should be invalidated for all running instances of the app after the update on one of them.
I have a global Kafka service, which registers every write/update request to my Cassandra DB
Now my question - is that possible to have a local, usual Spring cache with such an invalidation through Kafka resulting of course in synchronized cache version on all instances?
I would say it is possible in principle. You could build a naive solution, where
Read operations use #Cacheable
Write operations put a message to the Kafka bus, and each node has a listener that uses #CachePut to write it into the local cache.
But such a naive solution will not have any strict synchronisation guarantees, it is only eventually consistent. It takes time to propagate updates to the other nodes and in between other nodes could still read the old value. Also you would have to think about error conditions where an update could get lost.
If you want to have stricter guarantees, you need a multi-phase commit or a consensus protocol. Unless it is a research project I would highly discourage you from writing one yourself. These are not trivial because the problem is not trivial. Instead you should use existing implementations.
So in summary: If you don't need consistency, you could do it like you suggest. If you need any level of consistency guarantee, you should use an existing distributed cache, which can still be integrated with #Cacheable.
I want to implement some sort of lightweight caching in Java which is easily integrable in Java and should be easy to deploy with a Java application.
The cache layer will be between the application and the database layer: no database caching, no Spring, no Hibernate, no EHcache, no http caching.
We can use a file system or a nano database so that the cache can be restored so that the cache can be restored after the process restart.
I tried LRU Cache:
http://stackoverflow.com/questions/224868/easy-simple-to-use-lru-cache-in-java
http://www.programcreek.com/2013/03/leetcode-lru-cache-java/
But I am not sure how to after overflow should I save database into database (which database will be better to use for faster insert and seek of data). Or I should use File System?
Any one has better inputs to implement caching mechanism in Java?
But I am not sure how to after overflow should I save database into database(which database will be better to use for faster insert ans seek ok data) Or I should use File System?
It depends on the use case. If your cached values are very big, you can store each of it in a file and use the hash of the cache key as file name.
If you have values small in size, storing them as separate files would be a lot of overhead, so it is better to store the cached entries into one or a couple of files. To implement this you need to learn about "external indexes" and "memory management" or "free space management" (e.g. best fit, next fit and compaction strategies). This actually leads to the implementation of a tiny database, so may be use one :) Some stuff that comes to my mind: LevelDB, MapDB, LMDB, RocksDB
Keep in mind that caching operations come in concurrently from the application, so the cache may evict a value and a request to the same key may come in at the same time. Will you implement just the basic operations like Cache.get and Cache.put or also CAS-operations like Cache.putIfAbsent? Do you want to efficiently use multi core system, as they are common today?
Still, when using a tiny database, you will need to prepare for some months of engineering work.
Any one has better inputs to implement caching mechanism in Java?
You can read my blog at cruftex.net for some more input to implement lightweight and fast caching in Java.
For a cache implementation with overflow you can take a look at imcache. But imcache is not a fully-fledged generic cache, because for example CAS-operations are missing, see the Cache interface
My own high performance Java cache implementation cache2k, features CAS-operations, events, loaders&writers, expiry, etc. and it will eventually get some overflow to disk, too. However, I am not sure about the time frame... When you are interested to work in this area: contributions are welcome!
It is said in Cache apidoc, that several methods like purge() or flush() operates dependent on persistence storage configured.
Unfortunately, I can't find, how to configure one?
Is it really possible?
Older versions of cache2k had persistence support baked in. It was working, but, however it never made it to a level that I would fully trust for production.
The actual issue was the clear() operation, which had a quite complex implementation. The clear should be fast, regardless of the storage implementation needing some time to remove the data. So, my idea was to switch to a write back scheme, where operations get queued and executed when the storage is available again. Implementing a partial write back scheme just for the clear, is quite some over engineering...
For the moment I dropped persistence from the feature set, since I don't want a 1.0 version which has a stabilized API and provides already a lot of useful features.
As you can see from the roadmap on the cache2k homepage the current plan is first to add bulk and async features and then get back to storage. Probably the storage interface needs to look totally different after the async capabilities are done.
Inside the current cache2k implementation there are still the interfaces where the storage will be hooked in, so that I do not completely abandon what already is achieved. flush() and purge() are still some remnants of this. So I better consequently will remove those two methods for the 1.0 version, to avoid confusion.
BTW: Since I saw your question on Guava, cache2k has support for a CacheWriter which is the counterpart for CacheLoader. With the cache loader and writer you can read and write to a storage by yourself, but it is not identical to storage support inside the cache itself. For example cache.contains(...) would check the storage, but it does not check the cache loader at least according to JSR107 and in every cache implementation I know of.
A couple of Relational DB tables are managed by a single object cache that resides in a process. When the cache is committed the tables are updated. The DB relational tables are updated by regular SQL queries and not anything more fancier like hibernate.
Eventually, other processes got into the business of modifying this object without communicating with one another i.e, Each process would initialize this object (read from DB) and update it( commit to DB), & other process would not know about it holding on to a stale cache.
I have to fix this workflow. I have thought of couple of methods.
One is to make this object an mBean. So, the object would reside on one process and every process would eventually modify the object in that process by mBean method invocations.
However, this approach has a couple of problems.
1) Every object returned by this cache has be an mBean, which could make the method invocations quite chatty.
2) Also there is a requirement that every process should see a consistent data model(cache) of the DB, and it should merge its contents to the DB if possible. (like a transaction). If the DB was updated by some other process significantly, it is OK for the merge to fail.
What technologies in Java will help to solve this problem?
You should have a look at Terracotta. They have technology that makes multiple JVMs (can be on different servers) appear unified. If you update an object on one JVM, Terracotta will update the instance transparently on all JVMs in the cluster in a safe way.
If you wanted to keep the object model, you could use java object cache for centralized storage before committing. Or you could keep a shared lock using zookeeper.
But it sounds like you should really abandon the self-managed cache. Use hibernate or another JPA implementation, which you mentioned. JPA addresses the cache issues and maintains a L2 shared cache, so they've thought about this for you.
I agree with John - use a second level cache in hibernate with support for clustering. Much more straightforward way to manage data by using a simplified data access model and let Hibernate manage the details.
Terracotta Ehcache is one such cache, so is JBoss, Coherence, etc.
More info on Hibernate Second Level Cache can be had here and in the official Hibernate docs on Chapter 19. Improving Performance (note that the while the Hibernate docs do list second level cache providers, the list is woefully out of date, for example who uses Swarm Cache? The last release of that was in 2003)
Our design has one jvm that is a jboss/webapp (read/write) that is used to maintain the data via hibernate (using jpa) to the db. The model has 10-15 persistent classes with 3-5 levels of depth in the relationships.
We then have a separate jvm that is the server using this data. As it is running continuously we just have one long db session (read only).
There is currently no intra-jvm cache involved - so we manually signal one jvm from the other.
Now when the webapp changes some data, it signals the server to reload the changed data. What we have found is that we need to tell hibernate to purge the data and then reload it. Just doing a fetch/merge with the db does not do the job - mainly in respect of the objects several layers down the hierarchy.
Any thoughts on whether there is anything fundamentally wrong with this design or if anyone is doing this and has had better luck with working with hibernate on the reloads.
Thanks,
Chris
A Hibernate session loads all data it reads from the DB into what they call the first-level cache. Once a row is loaded from the DB, any subsequent fetches for a row with the same PK will return the data from this cache. Furthermore, Hibernate gaurentees reference equality for objects with the same PK in a single Session.
From what I understand, your read-only server application never closes its Hibernate session. So when the DB gets updated by the read-write application, the Session on read-only server is unaware of the change. Effectively, your read-only application is loading an in-memory copy of the database and using that copy, which gets stale in due course.
The simplest and best course of action I can suggest is to close and open Sessions as needed. This sidesteps the whole problem. Hibernate Sessions are intended to be a window for a short-lived interaction with the DB. I agree that there is a performance gain by not reloading the object-graph again and again; but you need to measure it and convince yourself that it is worth the pains.
Another option is to close and reopen the Session periodically. This ensures that the read-only application works with data not older than a given time interval. But there definitely is a window where the read-only application works with stale data (although the design guarantees that it gets the up-to-date data eventually). This might be permissible in many applications - you need to evaluate your situation.
The third option is to use a second level cache implementation, and use short-lived Sessions. There are various caching packages that work with Hibernate with relative merits and demerits.
Chris, I'm a little confused about your circumstances. If I understand correctly, you have a both a web app (read/write) a standalone application (read-only?) using Hibernate to access a shared database. The changes you make with the web app aren't visible to the standalone app. Is that right?
If so, have you considered using a different second-level cache implementation? I'm wondering if you might be able to use a clustered cache that is shared by both the web application and the standalone application. I believe that SwarmCache, which is integrated with Hibernate, will allow this, but I haven't tried it myself.
In general, though, you should know that the contents of a given cache will never be aware of activity by another application (that's why I suggest having both apps share a cache). Good luck!
From my point of view, you should change your underline Hibernate cache to that one, which supports clustered mode. It could be a JBoss Cache or a Swarm Cache. The first one has a better support of data synchronization (replication and invalidation) and also supports JTA.
Then you will able to configure cache synchronization between webapp and server. Also look at isolation level if you will use JBoss Cache. I believe you should use READ_COMMITTED mode if you want to get new data on a server from the same session.
The most used practice is to have a Container-Managed Entity Manager so that two or more applications in the same container (ie Glassfish, Tomcat, Websphere) can share the same caches.
But if you don't use an Application container, because you use Play! for instance, then I would build some webservices in the primary Application to read/write consistently in the cache.
I think using stale data is an open door for disaster. Just like Singletons become Multitons, read-only applications are often a write sometimes.
Belt and braces :)