How to define persistence storage in cache2k?

How to define persistence storage in cache2k? - java

It is said in Cache apidoc, that several methods like purge() or flush() operates dependent on persistence storage configured.
Unfortunately, I can't find, how to configure one?
Is it really possible?

Older versions of cache2k had persistence support baked in. It was working, but, however it never made it to a level that I would fully trust for production.
The actual issue was the clear() operation, which had a quite complex implementation. The clear should be fast, regardless of the storage implementation needing some time to remove the data. So, my idea was to switch to a write back scheme, where operations get queued and executed when the storage is available again. Implementing a partial write back scheme just for the clear, is quite some over engineering...
For the moment I dropped persistence from the feature set, since I don't want a 1.0 version which has a stabilized API and provides already a lot of useful features.
As you can see from the roadmap on the cache2k homepage the current plan is first to add bulk and async features and then get back to storage. Probably the storage interface needs to look totally different after the async capabilities are done.
Inside the current cache2k implementation there are still the interfaces where the storage will be hooked in, so that I do not completely abandon what already is achieved. flush() and purge() are still some remnants of this. So I better consequently will remove those two methods for the 1.0 version, to avoid confusion.
BTW: Since I saw your question on Guava, cache2k has support for a CacheWriter which is the counterpart for CacheLoader. With the cache loader and writer you can read and write to a storage by yourself, but it is not identical to storage support inside the cache itself. For example cache.contains(...) would check the storage, but it does not check the cache loader at least according to JSR107 and in every cache implementation I know of.

Related

Is cache2k a persistent cache?

I have used the cache2k in my java project and it was so simple (key-value pair) and easy to use. Now I want to know is if cache2k is a persistent or non-persistent cache.
I found the answer in here
https://stackoverflow.com/a/23709996/12605243 which was said at 2014 stated that it was gonna be updated to persistent cache.
So my question is 'Am I using a persistent or non persistent cache?'. I have read their docs but unable to find it.

Basically its possible to add persitence via CacheLoader and CacheWriter. We use that in several ways to use file system or database as storage. When adding persistence this way the cache operates in the so called "cache through" mode. Some operations of the cache, especially get and put operate transparently and read or write the data via the loader and writer to the storage. Other operations, like CAS operations, just interact with the in-memory cache.
The persistence feature as it was planed was meant to be transparent for all cache operations. Although its feasible and the basic work is done in the internal infrastructure, we don't have a big need for it. Other features and tasks seem more important. However, I am happy to hear about potential use cases.

Is it better to hold a repository for every web application (context) or is it better to share a common instance by JNDI or a similar technique

within our company it's kind of standard to create repositories for data which is originally stored in the database as described for example in https://thinkinginobjects.com/2012/08/26/dont-use-dao-use-repository/.
Our web infrastructure consist of a few independent web applications within Tomcat 7 for printing, product description, product order (this is not persisted in the database!), category description etc.
They are all build on Servlet 2 API.
So each instance/implementation of repository holds a specialised kind of data represented by serializable classes and the instances of this serialzable classes are set up/filled by an periodically executed database query (for every resultrow the setters of the fields are called; reminds me of domain oriented entity beans with CMP).
The repositories are initialized on the servlets init sequences (so every servlet keeps it's own set of instances).
Each context has a own connection to the Oracle database (set up by resource description file on deployment).
All the data is read only, we never need to write back to the database.
Because we need some of these data types for more than one web application (context) and some even for more than one servlet within the same web context repositories with an identical data type are instantiated more than once - e.g. four times, twice within the same application.
In the end some of the data is doubled and I'm not sure if this is as clever and efficient as it should be. It should be possible to share the same repository object to more than one application (JNDI?) but at least it must be possible to share it for several servlets within the same application context.
Despite I'm irritated by the idea to use a "self build" repository instead of something like a well tested, open developed cache (ehcache, jcs, ...) because some of these caches also provide options for distributed caches (so it should also work within the same container).
If certain entries are searched the search algorithm iterates over all entries in the repository (s. link above). For every search pattern there are specialised functions which are directly called from within the business logic classes using the "entity beans"; there's no specification object or interface.
In the end the application server as a whole does not perform that well and it uses a hell lot of RAM (at least for approximately 10000 DB entries); this is in my opinion most probably correlated to the use of serializeable XSD-to-JAXB-generated classes.
Additionally every time a application is deployed for tests you have to wait at least two minutes until all entries of the database have been loaded into the repositories - when deploying on live there's a well recognizable out of service phase on context/servlet start up.
I tend to think all of this is closely related to the solutions I described above.
Because I haven't got any experiences in this field and I'm new in the company I don't want to be to obtrusive.
Maybe you can help me to evaluate ideas for a better setup:
Is it for performance and memory better to unify all the repositories into one "repository servlet" and request objects from there via HTTP (don't think so, though it seems quite modular/distributed system friendly) or should I try to go with JNDI (never did that before) and connect to the repository similar to a JDBC database?
Wouldn't it be even more sensible, faster and efficient to at least use only one single connection pool for the whole Tomcat (and reference this connection pool from within the web apps deployment descriptor)? Or might that slow down connections or limit it in any other aspect?
I was told that the cache system (ehcache) didn't work well (at least not with the performance of the self written solution - though: I can't believe that). I imagine the usage of repositories backed by a distributed (as across all contexts) cache used in all web applications should not only reduce memory footprint significantly but should not be significantly slower. - I believe it will be faster and have shorter start up times respectively it shouldn't be needed to redeploy it that often.
I'm very grateful for every tip or hint and your thoughts. Would be marvellous to get a peer review of my ideas based on practical experiences.
So thank you very much in advance!

Is it better to hold a repository for every web application (context) or is it better to share a common instance by JDNI or a similar technique
Unless someone proves me otherwise I would say there is no way to do it, in a standard way, meaning as defined in the Servlet Sepc or in the rest of the Java EE spec canon.
There are technical ways to do it which probably depend on a specific application server implementation, but this cannot be "better" in its universal sense.
If you have two applications that operate on the same data, I wonder whether the partitioning of the applications is useful. Maybe all functionality operating on some kind of data needs to be in the same application?
within our company it's kind of standard to create repositories for data which is originally stored in the database as described for example in https://thinkinginobjects.com/2012/08/26/dont-use-dao-use-repository/.
I looked up Evans in our book shelf. The blog post is quite weird. A repository and a DAO are basically the same thing, it provides CRUD operations for an object or for a tree of objects (Evans says only the the aggregate roots).
The repositories are initialized on the servlets init sequences (so every servlet keeps it's own set of instances). Each context has a own connection to the Oracle database (set up by resource description file on deployment). [ ... ]
In the end the application server as a whole does not perform that well and it uses a hell lot of RAM
When something performs badly its the best to do profiling, e.g. with YourKit or with perf and FlameGraphs if you are on Linux. If your applications need a lot of RAM, analyze the heap e.g. with Eclipse MAT. There is no way somebody can give you a recommendation or hint on a best practice without seeing any line of code.
A general answer would include anyting about performance tuning for Oracle DBs, JDBC, Java Collections and Concurrent Programming, Networking and Operating Systems.
I was told that the cache system (ehcache) didn't work well (at least not with the performance of the self written solution - though: I can't believe that)
I can. EHCache is between 10-20 times slower then a simple HashMap. See: cache benchmarks. You only need a map, when you do a complete preload and don't have any mutations.
I imagine the usage of repositories backed by a distributed (as across all contexts) cache used in all web applications should not only reduce memory footprint significantly but should not be significantly slower
Distributed caches need to go over the network and add serialization/deserialization overhead. That's probably another factor 30 slower. When is the distributed cache updated?
I'm very grateful for every tip or hint and your thoughts.
Wrap up:
Do the normal software engineering homework, do profiling and analyzing and spend the effort of tuning at the right places
Ask specific questions on one topic on stackoverflow and share your code and performance data. Ask a question about one thing at one time and read https://stackoverflow.com/help/on-topic
You may also come to the conclusion that there is nothing to tune. There are applications out there that need a day to build up an in memory data structure from persistent data. Maybe its just a lot of data? If you do not like the downtime use green blue deployment. Also use smaller data sets for development and testing

Custom caching implementation in Java

I want to implement some sort of lightweight caching in Java which is easily integrable in Java and should be easy to deploy with a Java application.
The cache layer will be between the application and the database layer: no database caching, no Spring, no Hibernate, no EHcache, no http caching.
We can use a file system or a nano database so that the cache can be restored so that the cache can be restored after the process restart.
I tried LRU Cache:
http://stackoverflow.com/questions/224868/easy-simple-to-use-lru-cache-in-java
http://www.programcreek.com/2013/03/leetcode-lru-cache-java/
But I am not sure how to after overflow should I save database into database (which database will be better to use for faster insert and seek of data). Or I should use File System?
Any one has better inputs to implement caching mechanism in Java?

But I am not sure how to after overflow should I save database into database(which database will be better to use for faster insert ans seek ok data) Or I should use File System?
It depends on the use case. If your cached values are very big, you can store each of it in a file and use the hash of the cache key as file name.
If you have values small in size, storing them as separate files would be a lot of overhead, so it is better to store the cached entries into one or a couple of files. To implement this you need to learn about "external indexes" and "memory management" or "free space management" (e.g. best fit, next fit and compaction strategies). This actually leads to the implementation of a tiny database, so may be use one :) Some stuff that comes to my mind: LevelDB, MapDB, LMDB, RocksDB
Keep in mind that caching operations come in concurrently from the application, so the cache may evict a value and a request to the same key may come in at the same time. Will you implement just the basic operations like Cache.get and Cache.put or also CAS-operations like Cache.putIfAbsent? Do you want to efficiently use multi core system, as they are common today?
Still, when using a tiny database, you will need to prepare for some months of engineering work.
Any one has better inputs to implement caching mechanism in Java?
You can read my blog at cruftex.net for some more input to implement lightweight and fast caching in Java.
For a cache implementation with overflow you can take a look at imcache. But imcache is not a fully-fledged generic cache, because for example CAS-operations are missing, see the Cache interface
My own high performance Java cache implementation cache2k, features CAS-operations, events, loaders&writers, expiry, etc. and it will eventually get some overflow to disk, too. However, I am not sure about the time frame... When you are interested to work in this area: contributions are welcome!

Is it feasible to use Guava Cache as a "helper" for my own persistent cache?

I'm looking to relieve the pressure on a "lookup service" that hits a database each time, by putting a caching layer between the service provider and the service client. I want this caching layer to be persistent and to fit more objects than RAM would allow, so a vanilla Guava Cache won't do. I've looked into things like EhCache and CouchBase but have decided to roll my own for various reasons.
It's pretty easy to write the naive code for this persistent caching layer. However, I know enough about caching to realize that there are a lot of concurrency issues to handle, and I am pretty sure I won't get all of them right the first time. For example there is the "thundering herd" problem, where a cache miss could cause a lot of simultaneous requests to the backing service for the exact same object. It struck me that this is exactly the type of thing that a LoadingCache already handles. Does it seem like a reasonable idea to try to get Guava to do the hard stuff dealing with concurrency, and just plug in my own subclass(es) to do the actual object retrieval and storage? I'm not sure where the exact boundaries would be in terms of what I would subclass or override, but I can figure that out if this isn't just a totally misguided idea. I haven't seen examples of extending / customizing Guava caching, so if there are any examples and or documents to look at, I'd be interested in those.

What I ended up doing was very simple. I make a normal LoadingCache, and perform some extra operations in the load and reload methods. This gave me the hooks (i.e. the load and reload methods of CacheLoader) to look in my local database for an object, and call the remote service if I don't find it and persist it, without worrying that many threads would be trying to get the same object, due to all the concurrency guards that Guava provides.
I'm sure it's far from how the cache is intended to be used, since I'm actually setting the maximumSize on my cache to 0, so that my load function is always called. (For a variety of reasons I want to serve the objects from persistent storage each time, not from RAM). I haven't tested it thoroughly but it seems to be behaving as I want. The overall effect is a pull-through "object mirror", acting as a self-updating copy of the upstream service.

Transaction mode for file operations in Java

Perhaps what I'm trying to explain here doesn't make any sense, so I'd like to apologize in advance. Anyway, I will try.
I'm trying to read from a file, perform some database operations and move the content to another file. I was wondering if it is possible to perform all this operations in an atomic way in Java, so if anything goes wrong in the list of actions, rollback the complete sequence and go back to the start point.
Thanks in advance for your help.

Take a look at Apache Commons Transaction. It has the capability to manage files transactionally.
An archived article detailed its use with the file system.
update
Be aware that the status on the front page says:
We have decided to move the project to dormant as we are convinced that the main advertised feature transactional file access can not be implemented reliably. We are convinced that no such implementation can be possible on top of an ordinary file system. Although there are other useful parts (as multi level locking including deadlock detection) the transactional file system is the main reason people use this library for. As it simply can not be made fully transactional, it does not work as advertised.

There is no standard Transaction File API however I beleive that there is an Apache project that implements what you want.
http://commons.apache.org/transaction/file/index.html
The transactional file package provides you with code that allows you to have atomic read and write operations on any file system. The file resource manager offers you the possibility to isolate a number of operations on a set of files in a transaction. Using the locks package it is able to offer you full ACID transactions including serializability. Of course to make this work all access to the managed files must be done by this manager. Direct access to the file system can not be monitored by the manager.
update
Be aware that the status on the front page says:
We have decided to move the project to dormant as we are convinced that the main advertised feature transactional file access can not be implemented reliably. We are convinced that no such implementation can be possible on top of an ordinary file system. Although there are other useful parts (as multi level locking including deadlock detection) the transactional file system is the main reason people use this library for. As it simply can not be made fully transactional, it does not work as advertised.

As XADisk supports XA transactions over file-systems, it should solve you problem. It can participate in XA transactions along with Databases and other XA Resources.
In case your application is not in a JCA supportive environment, you can also use standalone Transaction Manager like Atomikos and carry out XA transactions involving both files (using XADisk) and Database.
update
The project's home page does not exist anymore and the last release on Maven was in 2013.

No, at least not with a simple call. Filesystems in general (and Java filesystem operations in particular) do not support a "rollback".
You could however emulate this. A common way would be to first rename the file so that it is marked as being "in processing". Append some suffix for example.
Then process it, then change the file. If anything goes wrong, just rollback the DB, rename all the file(s) with suffixes back to their original names and you're set.
As a bonus, on some FS a rename is even atomic, so you'd be safe even with concurrent updates (don't know if this is relevant for you). I do not know whether file renaming is atomic in Java though; you'd need to check.

You can coordinate a distributed transaction using Two-Phase Commit. However, this is fairly complex to implement and an approach I've often seen taken is to use single-phase commit, building a stack of transactions and then committing them all in quick succession, generating an error if one of the commit attempts fails but others succeed.
If you chose to implement Two-Phase Commit you'd require Write-Ahead Logging for each participant in the transaction, where you log actions before you've taken them, allowing you to roll back any changes if the transaction fails. For example, you'd need to do this in order to potentially reverse any changes made to files (as sleske mentions).

JBossTS proposes its own implementation for transactional file i/o, as part of the Narayana project (formerly called JBossTS).

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.