This is my loading cache definition:
private class ProductValue {
private long regionAValue;
private long regionBValue;
// constructor and general stuff here
}
private final LoadingCache<ProductId, ProductValue> productCache = CacheBuilder.newBuilder()
.expireAfterAccess(4, TimeUnit.MINUTES)
.build(new CacheLoader<ProductId, ProductValue>() {
#Override
public ProductValue load(final ProductId productId) throws Exception {
return updateProductValues(productId);
}
});
private ProductValue updateProductValues(final ProductId productId) {
// Read from disk and return
}
Now, I've a use case where I'm required to set the value of regionA or regionB in the cache until the next update happens. I'm utterly confused about the concurrency implications of the logic I've:
public void setProductValue(final ProductId productId, final boolean isTypeA, final long newValue) throws ExecutionException {
ProductValue existingValues = productCache.get(productId); // 1
if (isTypeA) {
existingValues.regionAValue = newValue;
} else {
existingValues.regionBValue = newValue;
}
productCache.put(productId, existingValues); // 2
}
In 1 I just read the reference of information stored in cache for given key, this get is thread safe because loading cache acts like a concurrent map. But between 1 and 2 this reference can be overwritten by some other thread. Since I've overwritten 'value' using reference which already existed in the cache, do I need to put the key-value pair in the cache? Do I need line 2?
(Disclaimer: I am not a Guava Cache expert)
I think you have two concurrency issues in your code:
You have two operations that mutate the object in existingValues, that is existingValues.regionAValue = ... and existingValues.setRegionValue(...). Other threads can see the state when only one operation is applied. I think that is not wanted. (correct?)
Between the get() and the put() the value may be loaded again in the cache and put() overwrites a new value.
Regarding 1:
If you have a more reads to the object then writes, a good option is to use an immutable object. You don't touch the instance but do a copy of the original object, mutate, and put the new object into the cache. This way only the final state becomes visible.
Regarding 2:
Atomic CAS operations can help you here (e.g. JSR107 compatible caches). The useful method would be boolean replace(K key, V oldValue, V newValue);
In Google Guava the CAS methods are accessible via the ConcurrentMap interface, that you can retrieve via asMap().
Related
I'm trying to figure out how to write a thread-safe, expiring entries, cache. This cache will be used as a no-hits cache, so that if an entry is not found in some storage, I will put it in this cache and avoid the subsequent calls in the next minutes.
There will be multiple threads reading and writing this cache.
There will be just a single ThreadSafeCache instance in my application.
I'm not sure if removing an entry in the contains method will arise synchronization issues.
How may I test this class for thread-safety?
Kind regards
public class ThreadSafeCache
{
private final Clock clock = Clock.systemUTC();
private final Duration expiration = Duration.ofMinutes(10);
private final ConcurrentHashMap<CacheKey, CacheValue> internalMap = new ConcurrentHashMap<>();
public boolean contains(String a, String b, byte[] c, String d)
{
CacheKey key = new CacheKey(a, b, c, d);
CacheValue value = internalMap.get(key);
if (value == null || value.isExpired())
{
internalMap.remove(key);
return false;
}
return true;
}
public void put(String a, String b, byte[] c, String d)
{
internalMap.computeIfAbsent(new CacheKey(a, b, c, d), key -> new CacheValue());
}
private class CacheValue
{
private final Instant insertionDate;
private CacheValue()
{
this.insertionDate = clock.instant();
}
boolean isExpired()
{
return Duration.between(insertionDate,
clock.instant()).compareTo(expiration) > 0;
}
}
}
You are calling 2 Map operations in the same function, meaning there is scope for interleaving (i.e. another operation happens in between the 2 operations in the function, changing its behaviour). To fix this, you can put the map operations in a synchronized (internalMap) {} block. Note, you must do this to any method that interacts with the map in 2 discrete method calls.
From a code-style point of view, it is bad practice to modify the map in the contains method. This will make your code less predictable. Another person coming to your code for the first time (or you in a few months time) may not remember that contains() actually modifies the cache. contains implies that it simply checks the cache, rather than modifying it.
My recommendation would be:
If the key has expired, simply return false.
In the get() method, check if the value has expired, and, if it has, compute a new one there.
Your question: "I'm not sure if removing an entry in the contains method will arise synchronization issues".
=> No problem with remove operation because you use synchronized collections ConcurrentHashMap, it's the best choice.
extra: another way to get a synchronized collections is: Collections.synchonize(myCollection), but it's not OK if we use remove operation in multithread envi (maybe in a loop), it throws ModificationException.
So, use synchronized collections (ex: ConcurrentHashMap) is alaways recommended
I have a cache refresh logic and want to make sure that it's thread-safe and correct way to do it.
public class Test {
Set<Integer> cache = Sets.newConcurrentHashSet();
public boolean contain(int num) {
return cache.contains(num);
}
public void refresh() {
cache.clear();
cache.addAll(getNums());
}
}
So I have a background thread refreshing cache - periodically call refresh. And multiple threads are calling contain at the same time. I was trying to avoid having synchronized in the methods signature because refresh could take some time (imagine that getNum makes network calls and parsing huge data) then contain would be blocked.
I think this code is not good enough because if contain called in between clear and addAll then contain always returns false.
What is the best way to achieve cache refreshing without impacting significant latency to contain call?
Best way would be to use functional programming paradigm whereby you have immutable state (in this case a Set), instead of adding and removing elements to that set you create an entirely new Set every time you want to add or remove elements. This is in Java9.
It can be a bit awkward or infeasible however to achieve this method for legacy code. So instead what you could do is have 2 Sets 1 which has the get method on it which is volatile, and then this is assigned a new instance in the refresh method.
public class Test {
volatile Set<Integer> cache = new HashSet<>();
public boolean contain(int num) {
return cache.contains(num);
}
public void refresh() {
Set<Integer> privateCache = new HashSet<>();
privateCache.addAll(getNums());
cache = privateCache;
}
}
Edit We don't want or need a ConcurrentHashSet, that is if you want to add and remove elements to a collection at the same time, which in my opinion is a pretty useless thing to do. But you want to switch the old Set with a new one, which is why you just need a volatile variable to make sure you can't read and edit the cache at the same time.
But as I mentioned in my answer at the start is that if you never modify collections, but instead make new ones each time you want to update a collection (note that this is a very cheap operation as internally the old set is reused in the operation). This way you never need to worry about concurrency, as there is no shared state between threads.
How would you make sure your cache doesn't contain invalid entries when calling contains?? Furthermore, you'd need to call refresh every time getNums() changes, which is pretty inefficient. It would be best if you make sure you control your changes to getNums() and then update cache accordingly. The cache might look like:
public class MyCache {
final ConcurrentHashMap<Integer, Boolean> cache = new ConcurrentHashMap<>(); //it's a ConcurrentHashMap to be able to use putIfAbsent
public boolean contains(Integer num) {
return cache.contains(num);
}
public void add(Integer nums) {
cache.putIfAbsent(num, true);
}
public clear(){
cache.clear();
}
public remove(Integer num) {
cache.remove(num);
}
}
Update
As #schmosel made me realize, mine was a wasted effort: it is in fact enough to initialize a complete new HashSet<> with your values in the refresh method. Assuming of course that the cache is marked with volatile. In short #Snickers3192's answer, points out what you seek.
Old answer
You can also use a slightly different system.
Keep two Set<Integer>, one of which will always be empty. When you refresh the cache, you can asynchronously re-initialize the second one and then just switch the pointers. Other threads accessing the cache won't see any particular overhead in this.
From an external point of view, they will always be accessing the same cache.
private volatile int currentCache; // 0 or 1
private final Set<Integer> caches[] = new HashSet[2]; // use two caches; either one will always be empty, so not much memory consumed
private volatile Set<Integer> cachePointer = null; // just a pointer to the current cache, must be volatile
// initialize
{
this.caches[0] = new HashSet<>(0);
this.caches[1] = new HashSet<>(0);
this.currentCache = 0;
this.cachePointer = caches[this.currentCache]; // point to cache one from the beginning
}
Your refresh method may look like this:
public void refresh() {
// store current cache pointer
final int previousCache = this.currentCache;
final int nextCache = getNextPointer();
// you can easily compute it asynchronously
// in the meantime, external threads will still access the normal cache
CompletableFuture.runAsync( () -> {
// fill the unused cache
caches[nextCache].addAll(getNums());
// then switch the pointer to the just-filled cache
// from this point on, threads are accessing the new cache
switchCachePointer();
// empty the other cache still on the async thread
caches[previousCache].clear();
});
}
where the utility methods are:
public boolean contains(final int num) {
return this.cachePointer.contains(num);
}
private int getNextPointer() {
return ( this.currentCache + 1 ) % this.caches.length;
}
private void switchCachePointer() {
// make cachePointer point to a new cache
this.currentCache = this.getNextPointer();
this.cachePointer = caches[this.currentCache];
}
I have a Java class that has a Guava LoadingCache<String, Integer> and in that cache, I'm planning to store two things: the average time active employees have worked for the day and their efficiency. I am caching these values because it would be expensive to compute every time a request comes in. Also, the contents of the cache will be refreshed (refreshAfterWrite) every minute.
I was thinking of using a CacheLoader for this situation, however, its load method only loads one value per key. In my CacheLoader, I was planning to do something like:
private Service service = new Service();
public Integer load(String key) throws Exception {
if (key.equals("employeeAvg"))
return calculateEmployeeAvg(service.getAllEmployees());
if (key.equals("employeeEff"))
return calculateEmployeeEff(service.getAllEmployees());
return -1;
}
For me, I find this very inefficient since in order to load both values, I have to invoke service.getAllEmployees() twice because, correct me if I'm wrong, CacheLoader's should be stateless.
Which made me think to use the LoadingCache.put(key, value) method so I can just create a utility method that invokes service.getAllEmployees() once and calculate the values on the fly. However, if I do use LoadingCache.put(), I won't have the refreshAfterWrite feature since it's dependent on a cache loader.
How do I make this more efficient?
It seems like your problem stems from using strings to represent value types (Effective Java Item 50). Instead, consider defining a proper value type that stores this data, and use a memoizing Supplier to avoid recomputing them.
public static class EmployeeStatistics {
private final int average;
private final int efficiency;
// constructor, getters and setters
}
Supplier<EmployeeStatistics> statistics = Suppliers.memoize(
new Supplier<EmployeeStatistics>() {
#Override
public EmployeeStatistics get() {
List<Employee> employees = new Service().getAllEmployees();
return new EmployeeStatistics(
calculateEmployeeAvg(employees),
calculateEmployeeEff(employees));
}});
You could even move these calculation methods inside EmployeeStatistics and simply pass in all employees to the constructor and let it compute the appropriate data.
If you need to configure your caching behavior more than Suppliers.memoize() or Suppliers.memoizeWithExpiration() can provide, consider this similar pattern, which hides the fact that you're using a Cache inside a Supplier:
Supplier<EmployeeStatistics> statistics = new Supplier<EmployeeStatistics>() {
private final Object key = new Object();
private final LoadingCache<Object, EmployeeStatistics> cache =
CacheBuilder.newBuilder()
// configure your builder
.build(
new CacheLoader<Object, EmployeeStatistics>() {
public EmployeeStatistics load(Object key) {
// same behavior as the Supplier above
}});
#Override
public EmployeeStatistics get() {
return cache.get(key);
}};
However, if I do use LoadingCache.put(), I won't have the refreshAfterWrite feature since it's dependent on a cache loader.
I'm not sure, but you might be able to call it from inside the load method. I mean, compute the requested value as you do and put in the other. However, this feels hacky.
If service.getAllEmployees is expensive, then you could cache it. If both calculateEmployeeAvg and calculateEmployeeEff are cheap, then recompute them when needed. Otherwise, it looks like you could use two caches.
I guess, a method computing both values at once could be a reasonable solution. Create a tiny Pair-like class aggregating them and use it as the cache value. There'll be a single key only.
Concerning your own solution, it could be as trivial as
class EmployeeStatsCache {
private long validUntil;
private List<Employee> employeeList;
private Integer employeeAvg;
private Integer employeeEff;
private boolean isValid() {
return System.currentTimeMillis() <= validUntil;
}
private synchronized List<Employee> getEmployeeList() {
if (!isValid || employeeList==null) {
employeeList = service.getAllEmployees();
validUntil = System.currentTimeMillis() + VALIDITY_MILLIS;
}
return employeeList;
}
public synchronized int getEmployeeAvg() {
if (!isValid || employeeAvg==null) {
employeeAvg = calculateEmployeeAvg(getEmployeeList());
}
return employeeAvg;
}
public synchronized int getEmployeeEff() {
if (!isValid || employeeAvg==null) {
employeeAvg = calculateEmployeeEff(getEmployeeList());
}
return employeeAvg;
}
}
Instead of synchronized methods you may want to synchronize on a private final field. There are other possibilities (e.g. Atomic*), but the basic design is probably simpler than adapting Guava's Cache.
Now, I see that there's Suppliers#memoizeWithExpiration in Guava. That's probably even simpler.
I have program consisting of a number of classes. I have a problem with the interraction of two of the classes - WebDataCache and Client. The problem classes are listed below.
WebData:
This is just a data class representing some data retrieved from the internet.
WebService:
This is just a web service wrapper class which connects to a particular web service, reads some data and stores it in an object of type WebData.
WebDataCache:
This is a class which uses the WebService class to retreive data that's cached in a map, keyed by the ids fields of the data.
Client:
This is is a class which contains a refrence to an instance of the WebDataCache class and uses the cached data.
The problem is (as illustrated below) when the class is looping through the cached data, it is possible for the WebDataCache to update the underlying collection.
My question is how do I synchronize access to the cache?
I don't want to synchronize the whole cache as there are multiple instance of the Client class, however each instantiated with a unique id (i.e. new Client(0,...), new Client(1,...), new Client(2,...), etc each instance only interested in data keyed by the id the client was instansiated with.
Are there any relevent design patterns I can use?
class WebData {
private final int id;
private final long id2;
public WebData(int id, long id2) {
this.id = id;
this.id2 = id2;
}
public int getId() { return this.id; }
public long getId2() { return this.id2; }
}
class WebService {
Collection<WebData> getData(int id) {
Collection<WebData> a = new ArrayList<WebData>();
// populate A with data from a webservice
return a;
}
}
class WebDataCache implements Runnable {
private Map<Integer, Map<Long, WebData>> cache =
new HashMap<Integer, Map<Long, WebData>>();
private Collection<Integer> requests =
new ArrayList<Integer>();
#Override
public void run() {
WebService webSvc = new WebService();
// get data from some web service
while(true) {
for (int id : requests) {
Collection<WebData> webData = webSvc.getData(id);
Map<Long, WebData> row = cache.get(id);
if (row == null)
row = cache.put(id, new HashMap<Long, WebData>());
else
row.clear();
for (WebData webDataItem : webData) {
row.put(webDataItem.getId2(), webDataItem);
}
}
Thread.sleep(2000);
}
}
public synchronized Collection<WebData> getData(int id){
return cache.get(id).values();
}
public synchronized void requestData(int id) {
requests.add(id);
}
}
-
class Client implements Runnable {
private final WebDataCache cache;
private final int id;
public Client(int id, WebDataCache cache){
this.id = id;
this.cache = cache;
}
#Override
public void run() {
cache.requestData(id);
while (true) {
for (WebData item : cache.getData(id)) {
// java.util.ConcurrentModificationException is thrown here...
// I understand that the collection is probably being modified in WebDataCache::run()
// my question what's the best way to sychronize this code snippet?
}
}
}
}
Thanks!
Use java.util.concurrent.ConcurrentHashMap instead of plain old java.util.HashMap. From the Javadoc:
A hash table supporting full
concurrency of retrievals and
adjustable expected concurrency for
updates. This class obeys the same
functional specification as Hashtable,
and includes versions of methods
corresponding to each method of
Hashtable. However, even though all
operations are thread-safe, retrieval
operations do not entail locking, and
there is not any support for locking
the entire table in a way that
prevents all access. This class is
fully interoperable with Hashtable in
programs that rely on its thread
safety but not on its synchronization
details.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html
So you would replace:
private Map<Integer, Map<Long, WebData>> cache =
new HashMap<Integer, Map<Long, WebData>>();
With
private Map<Integer, Map<Long, WebData>> cache =
new ConcurrentHashMap<Integer, Map<Long, WebData>>();
My best recommendation is to use an existing cache implementation such as JCS or EhCache - these are battle tested implementations.
Otherwise, you have a couple of things going on in your code. Things that can break in funny ways.
HashMap can grow infinite loops when modified concurrently by multiple threads. So don't. Use java.util.concurrent.ConcurrentHashMap instead.
The ArrayList that you use for WebDataCache.requests isn't thread-safe either and you have inconsistent synchronization - either change it to a safer list implementation from java.util.concurrent or make sure that all access to it is synchronizing on the same lock.
Lastly, have your code checked with FindBugs and/or properly reviewed by someone with solid knowledge and experience on writing multi-threaded code.
If you want to read a book on this stuff, I can recommend Java Concurrency in Practice by Brian Goetz.
In addition to the other posted recommendations, consider how often the cache is updated versus just being read. If the reading dominates and updating is rare, and it's not critical that the reading loop be able to see every update immediately, consider using a CopyOnWriteArraySet. It and its sibling CopyOnWriteArrayList allow concurrent reading and updating of the members; the reader sees a consistent snapshot unaffected by any mutation of the underlying collection -- analogous to the SERIALIZABLE isolation level in a relational database.
The problem here, though, is that neither of these two structures give you your dictionary or associative array storage (a la Map) out of the box. You'd have to define a composite structure to store the key and value together, and, given that CopyOnWriteArraySet uses Object#equals() for membership testing, you'd have to write an unconventional key-based equals() method for your structure.
The answer from LES2 is good except that you would have to replace:
row = cache.put(id, new HashMap<Long, WebData>());
with:
row = cache.put(id, new ConcurrentHashMap<Long, WebData>());
For that's the one that hold the "problematic" collection and not the whole cache.
You can synchronize on the row returned by the cache who is at the end who holds the collection that is being shared.
On WebDataCache:
Map<Long, WebData> row = cache.get(id);
if (row == null) {
row = cache.put(id, new HashMap<Long, WebData>());
} else synchronized( row ) {
row.clear();
}
for (WebData webDataItem : webData) synchronized( row ) {
row.put(webDataItem.getId2(), webDataItem);
}
// it doesn't make sense to synchronize the whole cache here.
public Collection<WebData> getData(int id){
return cache.get(id).values();
}
On Client:
Collection<WebData> data = cache.getData(id);
synchronized( data ) {
for (WebData item : cache.getData(id)) {
}
}
Of course this is far from perfect it just answer the question of what to synchronize. In this case it would be the access to the underlaying collection in. row.clear row.put on the cache and the iteration on the client.
BTW, why do you have a Map in the cache, and you use a collection in the client. You should use the same structure on both and don't expose the underlying implementation.
I know it's simple to implement, but I want to reuse something that already exist.
Problem I want to solve is that I load configuration (from XML so I want to cache them) for different pages, roles, ... so the combination of inputs can grow quite much (but in 99% will not). To handle this 1%, I want to have some max number of items in cache...
Till know I have found org.apache.commons.collections.map.LRUMap in apache commons and it looks fine but want to check also something else. Any recommendations?
You can use a LinkedHashMap (Java 1.4+) :
// Create cache
final int MAX_ENTRIES = 100;
Map cache = new LinkedHashMap(MAX_ENTRIES+1, .75F, true) {
// This method is called just after a new entry has been added
public boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
};
// Add to cache
Object key = "key";
cache.put(key, object);
// Get object
Object o = cache.get(key);
if (o == null && !cache.containsKey(key)) {
// Object not in cache. If null is not a possible value in the cache,
// the call to cache.contains(key) is not needed
}
// If the cache is to be used by multiple threads,
// the cache must be wrapped with code to synchronize the methods
cache = (Map)Collections.synchronizedMap(cache);
This is an old question, but for posterity I wanted to list ConcurrentLinkedHashMap, which is thread safe, unlike LRUMap. Usage is quite easy:
ConcurrentMap<K, V> cache = new ConcurrentLinkedHashMap.Builder<K, V>()
.maximumWeightedCapacity(1000)
.build();
And the documentation has some good examples, like how to make the LRU cache size-based instead of number-of-items based.
Here is my implementation which lets me keep an optimal number of elements in memory.
The point is that I do not need to keep track of what objects are currently being used since I'm using a combination of a LinkedHashMap for the MRU objects and a WeakHashMap for the LRU objects.
So the cache capacity is no less than MRU size plus whatever the GC lets me keep. Whenever objects fall off the MRU they go to the LRU for as long as the GC will have them.
public class Cache<K,V> {
final Map<K,V> MRUdata;
final Map<K,V> LRUdata;
public Cache(final int capacity)
{
LRUdata = new WeakHashMap<K, V>();
MRUdata = new LinkedHashMap<K, V>(capacity+1, 1.0f, true) {
protected boolean removeEldestEntry(Map.Entry<K,V> entry)
{
if (this.size() > capacity) {
LRUdata.put(entry.getKey(), entry.getValue());
return true;
}
return false;
};
};
}
public synchronized V tryGet(K key)
{
V value = MRUdata.get(key);
if (value!=null)
return value;
value = LRUdata.get(key);
if (value!=null) {
LRUdata.remove(key);
MRUdata.put(key, value);
}
return value;
}
public synchronized void set(K key, V value)
{
LRUdata.remove(key);
MRUdata.put(key, value);
}
}
I also had same problem and I haven't found any good libraries... so I've created my own.
simplelrucache provides threadsafe, very simple, non-distributed LRU caching with TTL support. It provides two implementations
Concurrent based on ConcurrentLinkedHashMap
Synchronized based on LinkedHashMap
You can find it here.
Here is a very simple and easy to use LRU cache in Java.
Although it is short and simple it is production quality.
The code is explained (look at the README.md) and has some unit tests.