How to populate entries into Loading Cache guava? - java

I have a use case where I want to populate entries into a data structure from multiple threads and after a particular size is reached start dropping old records. So I decided to use Guava Loading Cache for this.
I want to populate entries into my Loading Cache from multiple threads and I am setting eviction based policy as Size Based Eviction.
private final ScheduledExecutorService executorService = Executors
.newSingleThreadScheduledExecutor();
private final LoadingCache<String, DataBuilder> cache =
CacheBuilder.newBuilder().maximumSize(10000000)
.removalListener(RemovalListeners.asynchronous(new CustomListener(), executorService))
.build(new CacheLoader<String, DataBuilder>() {
#Override
public DataBuilder load(String key) throws Exception {
// what I should do here?
// return
}
});
// this will be called from multiple threads to populate the cache
public void addToCache(String key, DataBuilder dataBuilder) {
// what I should do here?
//cache.get(key).
}
My addToCache method will be called from multiple threads to populate the cache. I am confuse what I should be doing inside addToCache method to fill the cache and also what does my load method looks like?
Here DataBuilder is my builder pattern.

Obviously your problem is that you don't get the main purpose of a CacheLoader.
A CacheLoader is used to automatically load the value of a given key (which doesn't exist in the cache yet) when calling get(K key) or getUnchecked(K key) in way that even if we have several threads trying to get the value of the same key at the same time, only one thread will actually load the value and once done all calling threads will have the same value.
This is typically useful when the value to load takes some time, like for example when it is the result of a database access or a long computation, because the longer it takes the higher is the probability to have several threads trying to load the same value at the same time which would waste resources without a mechanism that ensures that only one thread will load the data for all calling threads.
So here let's say that your DataBuilder's instances are long to build or you simply need to ensure that all threads will have the same instance for a given key, you would then indeed need a CacheLoader and it would look like this:
new CacheLoader<String, DataBuilder>() {
#Override
public DataBuilder load(String key) throws Exception {
return callSomeMethodToBuildItFromTheKey(key); // could be new DataBuilder(key)
}
}
Thanks to the CacheLoader, you have no need to call put explicitly anymore as your cache will be populated behind the scene by the threads calling cache.get(myKey) or cache.getUnchecked(myKey).
If you want to manually populate your cache, you can simply use the put(K key, V value) method like any Cache as next:
public void addToCache(String key, DataBuilder dataBuilder) {
cache.put(key, dataBuilder);
}
If you intend to populate the cache yourself, you don't need a CacheLoader, you can simply call build() instead of build(CacheLoader<? super K1,V1> loader) to build your Cache instance (it won't be a LoadingCache anymore).
Your code would then be:
private final Cache<String, DataBuilder> cache =
CacheBuilder.newBuilder().maximumSize(10000000)
.removalListener(
RemovalListeners.asynchronous(new CustomListener(), executorService)
).build();

Related

Why is the following block not thread safe?

We have a Spring Boot application where we fetch data from a database and keep it in a concurrent Hashmap tempMap. Then we use it to fetch details based on the incoming keys. But keys are in the form of arrays. Now, once loaded we don't refresh the map until the scheduler hits. But we can see empty optional being returned but the map was never refreshed. The method is being accessed by many threads in the application. Seems some kind of memory leak is there. Please, anyone can help to point what might be the issue.
This is the function body accessed by many threads that throws No value present exception:
if (sender.length == 1) {
return getSingle(sender[0]).get();
} else {
return getMultiple(sender).get();
}
These are the implementations for functions:
private Optional<DummyClassEntity> getSingle(String value) {
Optional<DummyClassEntity> dummyClass;
dummyClass = Optional.of(tempMap.get(value));
return dummyClass;
}
private Optional<DummyClassEntity> getMultiple(String[] values) {
return Arrays.stream(values)
.filter(value -> tempMap.containsKey(value))
.findFirst()
.map(s -> tempMap.get(s));
}
I tried using synchronized block. But that still impacts the performance. And this part is being used for an high performance application.

Multi Requests break same class instance during execution

I have a class played as cache which uses a Map (either HashMap or ConcurrentHashMap), I'd like to clear my Map before executing each new (http) request, e.g
#Component
public Class MyCache {
Map cache = new ConcurrentHashMap();
get(key) {
cache.computeIfAbsent(key, fetchFromDB())
}
clearCache() {
cache.clear()
}
}
#Controller
public Class MyController {
#Autowired
MyCache myCache
#Get
Response getInfo(ids) {
// give me a fresh cache at beginning of every new request
myCache.clearCache()
// load and fetch from myCache of current request
ids.foreach(id -> {
myCache.get(id)
})
}
}
Above code idea is to
initially reset cache when a new request comes in
then for all id of input(could be hundreds), fetch from cache
if same id already stored in cache, we don't need to re-call fetchFromDB.
Everything works locally with single thread, but when calling with 2 or more threads, there are chances that during the execution of thread1, thread2 started and it would call myCache.clearCache(), somehow my thread1 suddenly found nothing stored in myCache anymore for all its processed items.
The reason is because my map was in class as singleton (e.g MyCache, Controller), while even each request deals with its own thread, they will take action on same instance
What's the best way that I would fix this issue if I still wants to get a clean cache for each request comes in? Anyway I can detect if there might be other threads still executing before my current thread clearCache()
I solved it by following how Google Guava Cache works with Concurrent Hashmap and Reentrance lock as Segment

How do I use the key, in a condition in Cacheable annotation

I'm caching the results of a function using the #cacheable annotation.
I have 3 different caches and the key for each one is the user id of the currently logged in user concatenated with an argument in the method .
On a certain event I want to evict all the cache entries which have the key that starts with that particular user id.
For example :
#Cacheable(value = "testCache1", key = "'abcdef'")
I want the cache evict annotation to be something like :
#CacheEvict(value = "getSimilarVendors", condition = "key.startsWith('abc')")
But when I try to implement this it gives me an error :
Property or field 'key' cannot be found on object of type'org.springframework.cache.interceptor.CacheExpressionRootObject' - maybe not public?
What is the correct way to implement this?
All of the Spring Cache annotations (i.e. #Cacheable, #CacheEvict, etc) work on 1 cache entry per operation. #CacheEvict does support clearing the entire cache (with the allEntries attribute, however ignores the key in this case), but it is not selective (capable) in clearing a partial set of entries based on a key pattern in a single operation as you have described.
The main reason behind this is the Spring Cache interface abstraction itself, where the evict(key:Object) method takes a single key argument. But technically, it actually depends on the underlying Cache implementation (e.g. GemfireCache), which would need to support eviction on all entries who's keys match a particular pattern, which is typically, not the case for most caches (e.g. certainly not for GemFire, and not for Google Guava Cache either; see here and here.)
That is not to say you absolutely cannot achieve your goal. It's just not something supported out-of-the-box.
The interesting thing, minus some technical issues with your approach, is that your condition achieves sort of what you want... a cache eviction only occurs if the key satisfies the condition. However, you #CacheEvict annotated method is just missing the "key", hence the error. So, something like the following would satisfy the SpEL in your condition...
#CacheEvict(condition = "#key.startsWith('abc')")
public void someMethod(String key) {
...
}
However, you have to specify the key as an argument in this case. But, you don't want a specific key, you want a pattern matching several keys. So, forgo the condition and just use...
#CacheEvict
public void someMethod(String keyPattern) {
...
}
By way of example, using Guava as the caching provider, you would now need to provide a "custom" implementation extending GuavaCache.
public class CustomGuavaCache extends org.springframework.cache.guava.GuavaCache {
protected boolean isMatch(String key, String pattern) {
...
}
protected boolean isPattern(String key) {
...
}
#Override
public void evict(Object key) {
if (key instanceof String && isPattern(key.toString()))) {
Map<String, Object> entries = this.cache.asMap();
Set<String> matchingKeys = new HashSet<>(entries.size());
for (String actualKey : entries.keySet()) {
if (isMatch(actualKey, key.toString()) {
matchingKeys.add(actualKey);
}
}
this.cache.invalidateAll(matchingKeys);
}
else {
this.cache.invalidate(key);
}
}
}
Now just extend the GuavaCacheManager to plugin your "custom" GuavaCache (CustomGuavaCache)...
public class CustomGuavaCacheManager extends org.springframework.cache.guava.GuavaCacheManager {
#Override
protected Cache createGuavaCache(String name) {
return new CustomGuavaCache(name, createNativeGuavaCache(name), isAllowNullValues());
}
}
This approach takes advantage of Guava's Cache's invalidateAll(keys:Iterable) method. And, of course, you could use Java's Regex support to perform the "matching" on the desired keys to be evicted inside the isMatch(key, pattern) method.
So, I have not tested this, but this (or something similar) should achieve (almost) what you want (fingers crossed ;-)
Hope this helps!
Cheers,
John

spring cache, TTL unles service is down

I have an interesting task where I need to cache the results of my method, which is really simple with spring cache abstraction
#Cachable(...)
public String getValue(String key){
return restService.getValue(key);
}
The restService.getValue() targets a REST service, which can be answering or not if the end point is down.
I need to set a specific TTL for the cache value, lets say 5 minutes, but in case if the server is down I need to return the last value, even if it extends 5 minutes.
I was thinking about having a second cachable method which have no TTL and always returns the last value, it would be called from getValue if restService returns nothing, but maybe there is a better way?
I've been interested in doing this for a while too. Sorry to say, I have not found any trivial way of doing this. Spring will not do this for you, it's more a question of whether what cache implementation spring is wrapping can do it. I assume you are using the EhCache implementation. Unfortunately this functionality does not come out the box as far as I know.
There are various ways one can achieve something similar depending on your problem
1) use an eternal cache time and have a second class Thread which periodically loops over the cached data refreshing it. I have not done this exactly, but the Thread class would need to have to look something like this:
#Autowired
EhCacheCacheManager ehCacheCacheManager;
...
//in the infinite loop
List keys = ((Ehcache) ehCacheCacheManager.getCache("test").getNative Cache()).getKeys();
for (int i = 0; i < keys.size(); i++) {
Object o = keys.get(i);
Ehcache ehcache = (Ehcache)ehCacheCacheManager.getCache("test").getNativeCache()
Element item = (ehcache).get(o);
//get the data based on some info in the value, and if no exceptions
ehcache.put(new Element(element.getKey(), newValue));
}
benefits are this is very fast for the #Cacheable caller, downside is your server might get more hits than neccessary
2) You could make a CacheListener to listen to the eviction event, store the data temporarily. And should the server call fail, use that data and return from the method.
the ehcache.xml
<cacheEventListenerFactory class="caching.MyCacheEventListenerFactory"/>
</cache>
</ehcache>
The factory:
import net.sf.ehcache.event.CacheEventListener;
import net.sf.ehcache.event.CacheEventListenerFactory;
import java.util.Properties;
public class MyCacheEventListenerFactory extends CacheEventListenerFactory {
#Override
public CacheEventListener createCacheEventListener(Properties properties) {
return new CacheListener();
}
}
The Pseudo-implementation
import net.sf.ehcache.CacheException;
import net.sf.ehcache.Ehcache;
import net.sf.ehcache.Element;
import net.sf.ehcache.event.CacheEventListener;
import java.util.concurrent.ConcurrentHashMap;
public class CacheListener implements CacheEventListener {
//prob bad practice to use a global static here - but its just for demo purposes
public static ConcurrentHashMap myMap = new ConcurrentHashMap();
#Override
public void notifyElementPut(Ehcache ehcache, Element element) throws CacheException {
//we can remove it since the put happens after a method return
myMap.remove(element.getKey());
}
#Override
public void notifyElementExpired(Ehcache ehcache, Element element) {
//expired item, we should store this
myMap.put(element.getKey(), element.getValue());
}
//....
}
A challenge here is that the key is not very useful, you might need to store something about the key in the returned value to be able to pick it up if the server call fails. This feels a bit hacky, and I have not determined if this is exactly bullet proof. It might need some testing.
3) A lot of effort but works:
#Cacheable("test")
public MyObject getValue(String data) {
try {
MyObject result = callServer(data);
storeResultSomewhereLikeADatabase(result);
} catch (Exception ex) {
return getStoredResult(data);
}
}
a Pro here is that it will work between server restarts, and you can extend it simply to allow shared caches between clustered servers.
I had a version in an 12 clustered environment where each one checked the database first to see if any other cluster had got the "expensive" data first
and then reused that rather than make the server call.
A slight variant would also be to use a second #Cacheable method together with #CachePut rather than a DB to store the data. But this would mean doubling up in memory usage. That might be acceptable depending on your result sizes.
Maybe you can use spel to change the used cache (one using ttl and the second not) if the condition (is the service up?) is true or false, I've never used spel this way (I used it to change the key based on some request params) but I think it could work
#Cacheable(value = "T(com.xxx.ServiceChecker).checkService()",...)
where checkService() is a static method that returns the name of the cache that should be used

Pre-load values for a Guava Cache

I have a requirement where we are loading static data from a database for use in a Java application. Any caching mechanism should have the following functionality:
Load all static data from the database (once loaded, this data will not change)
Load new data from the database (data present in the database at start-up will not change but it is possible to add new data)
Lazy loading of all the data isn't an option as the application will be deployed to multiple geographical locations and will have to communicate with a single database. Lazy loading the data will make the first request for a specific element too slow where the application is in a different region to the database.
I have been using the MapMaker API in Guava with success but we are now upgrading to the latest release and I can't seem to find the same functionality in the CacheBuilder API; I can't seem to find a clean way of loading all data at start-up.
One way would be to load all keys from the database and load those through the Cache individually. This would work but would result in N+1 calls to the database, which isn't quite the efficient solution I'm looking for.
public void loadData(){
List<String> keys = getAllKeys();
for(String s : keys)
cache.get(s);
}
Or the other solution is to use a ConcurrentHashMap implementation and handle all of the threads and missing entries myself? I'm not keen on doing this as the MapMaker and CacheBuilder APIs provide the key-based thread locking for free without having to provide extra testing. I'm also pretty sure the MapMaker/CacheBuilder implementations will have some efficiencies that I don't know about/haven't got time to investigate.
public Element get(String key){
Lock lock = getObjectLock(key);
lock.lock();
try{
Element ret = map.get(key)
if(ret == null){
ret = getElement(key); // database call
map.put(key, e);
}
return ret;
}finally {
lock.unlock();
}
}
Can anyone think of a better solution to my two requirements?
Feature Request
I don't think pre-loading a cache is an uncommon requirement, so it would be nice if the CacheBuilder provided a configuration option to pre-load the cache. I think providing an Interface (much like CacheLoader) which will populate the cache at start-up would be an ideal solution, such as:
CacheBuilder.newBuilder().populate(new CachePopulator<String, Element>(){
#Override
public Map<String, Element> populate() throws Exception {
return getAllElements();
}
}).build(new CacheLoader<String, Element>(){
#Override
public Element load(String key) throws Exception {
return getElement(key);
}
});
This implementation would allow the Cache to be pre-populated with all relevant Element objects, whilst keeping the underlying CustomConcurrentHashMap non-visible to the outside world.
In the short-term I would just use Cache.asMap().putAll(Map<K, V>).
Once Guava 11.0 is released you can use Cache.getAll(Iterable<K>), which will issue a single bulk request for all absent elements.
I'd load all static data from the DB, and store it in the Cache using cache.asMap().put(key, value) ([Guava 10.0.1 allows write operations on the Cache.asMap() view][1]).
Of course, this static data might get evicted, if your cache is configured to evict entries...
The CachePopulator idea is interesting.

Categories

Resources