Lambdas and putIfAbsent

Lambdas and putIfAbsent - java

I posted an answer here where the code demonstrating use of the putIfAbsent method of ConcurrentMap read:
ConcurrentMap<String, AtomicLong> map = new ConcurrentHashMap<String, AtomicLong> ();
public long addTo(String key, long value) {
// The final value it became.
long result = value;
// Make a new one to put in the map.
AtomicLong newValue = new AtomicLong(value);
// Insert my new one or get me the old one.
AtomicLong oldValue = map.putIfAbsent(key, newValue);
// Was it already there? Note the deliberate use of '!='.
if ( oldValue != newValue ) {
// Update it.
result = oldValue.addAndGet(value);
}
return result;
}
The main downside of this approach is that you have to create a new object to put into the map whether it will be used or not. This can have significant effect if the object is heavy.
It occurred to me that this would be an opportunity to use Lambdas. I have not downloaded Java 8 n'or will I be able to until it is official (company policy) so I cannot test this but would something like this be valid and effective?
public long addTo(String key, long value) {
return map.putIfAbsent( key, () -> new AtomicLong(0) ).addAndGet(value);
}
I am hoping to use the lambda to delay the evaluation of the new AtomicLong(0) until it is actually determined that it should be created because it does not exist in the map.
As you can see this is much more succinct and functional.
Essentially I suppose my questions are:
Will this work?
Or have I completely misinterpreted lambdas?
Might something like this work one day?

UPDATE 2015-08-01
The computeIfAbsent method as described below has indeed been added to Java SE 8. The semantics appear to be very close to the pre-release version.
In addition, computeIfAbsent, along with a whole pile of new default methods, has been added to the Map interface. Of course, maps in general can't support atomic updates, but the new methods add considerable convenience to the API.
What you're trying to do is quite reasonable, but unfortunately it doesn't work with the current version of ConcurrentMap. An enhancement is on the way, however. The new version of the concurrency library includes ConcurrentHashMapV8 which contains a new method computeIfAbsent. This pretty much allows you to do exactly what you're looking to do. Using this new method, your example could be rewritten as follows:
public long addTo(String key, long value) {
return map.computeIfAbsent( key, () -> new AtomicLong(0) ).addAndGet(value);
}
For further information about the ConcurrentHashMapV8, see Doug Lea's initial announcement thread on the concurrency-interest mailing list. Several messages down the thread is a followup message that shows an example very similar to what you're trying to do. (Note however the old lambda syntax. That message was from August 2011 after all.) And here is recent javadoc for ConcurrentHashMapV8.
This work is intended to be integrated into Java 8, but it hasn't yet as far as I can see. Also, this is still a work in progress, names and specs may change, etc.

AtomicLong is not really a heavy object. For heavier objects I would consider a lazy proxy and provide a lambda to that one to create the object if needed.
class MyObject{
void doSomething(){}
}
class MyLazyObject extends MyObject{
Funktion create;
MyLazyObject(Funktion create){
this.create = create;
}
MyObject instance;
MyObject getInstance(){
if(instance == null)
instance = create.apply();
return instance;
}
#Override void doSomething(){getInstance().doSomething();}
}
public long addTo(String key, long value) {
return map.putIfAbsent( key, new MyLazyObject( () -> new MyObject(0) ) );
}

Unfortunately it's not as easy as that. There are two main problems with the approach you've sketched out:
1. The type of the map would need to change from Map<String, AtomicLong> to Map<String, AtomicLongFunction> (where AtomicLongFunction is some function interface that has a single method that takes no arguments and returns an AtomicLong).
2. When you retrieve the element from the map you'd need to apply the function each time to get the AtomicLong out of it. This would result in creating a new instance each time you retrieve it, which is not likely what you wanted.
The idea of having a map that runs a function on demand to fill up missing values is a good one, though, and in fact Google's Guava library has a map that does exactly that; see their MapMaker. In fact that code would benefit from Java 8 lambda expressions: instead of
ConcurrentMap<Key, Graph> graphs = new MapMaker()
.concurrencyLevel(4)
.weakKeys()
.makeComputingMap(
new Function<Key, Graph>() {
public Graph apply(Key key) {
return createExpensiveGraph(key);
}
});
you'd be able to write
ConcurrentMap<Key, Graph> graphs = new MapMaker()
.concurrencyLevel(4)
.weakKeys()
.makeComputingMap((Key key) -> createExpensiveGraph(key));
or
ConcurrentMap<Key, Graph> graphs = new MapMaker()
.concurrencyLevel(4)
.weakKeys()
.makeComputingMap(this::createExpensiveGraph);

Note that using Java 8 ConcurrentHashMap it's completely unnecessary to have AtomicLong values. You can safely use ConcurrentHashMap.merge:
ConcurrentMap<String, Long> map = new ConcurrentHashMap<String, Long>();
public long addTo(String key, long value) {
return map.merge(key, value, Long::sum);
}
It's much simpler and also significantly faster.

Related

Is this a case of the 'double check' anti-pattern? (Get a map entry and create it if it does not exist)

I'm trying to do a thread safe getter of a map value, which also creates the value if it does not exist:
private final static Map<String, MyObject> myMap = new HashMap<String, MyObject>();
public MyObject myGetter(String key) {
// Try to retrieve the object without synchronization
MyObject myObject = myMap.get(key);
if (myObject != null) {
return myObject;
}
// The value does not exist, so create a new object.
synchronized (myMap) {
myObject = myMap.get(key);
if (myObject == null) {
myObject = new MyObject(key);
myMap.put(key, myObject);
}
return myObject;
}
}
Note that the myMap member variable is final, and it is not accessed anywhere else.
I do not want the MyObject to be created if not needed (some other patterns I have found suggest a pattern that may result in multiple creations for the same key).
I'm aware of the 'double check' anti-pattern, but I'm not sure if this code is applicable on that anti-pattern since it is not the member variable itself that is created here.
So, my question is if this still is a case of that anti-pattern, and if so: why?
Edit 1:
Judging from comments, one fix here would be to simply accept the 'performance impact' (a minor impact I guess), and just include the read in the synchronized block, like this:
private final static Map<String, MyObject> myMap = new HashMap<String, MyObject>();
public MyObject myGetter(String key) {
synchronized (myMap) {
MyObject myObject = myMap.get(key);
if (myObject == null) {
// The value does not exist, create a new object.
myObject = new MyObject(key);
myMap.put(key, myObject);
}
return myObject;
}
}
Also, I'm on Java 6, so the ConcurrentHashMap.computeIfAbsent is not an option here.

As the comments and your edit indicates, this is a form of the double-checked-locking anti-pattern.
The correct implementation is to use the ConcurrentHashMap. You listed in your comments that you are using Java 6 and cannot then use computeIfAbsent. That's fine, you should still use the ConcurrentHashMap#putIfAbsent.
You can do a double-check like operation
private final static ConcurrentMap<String, MyObject> myMap =
new ConcurrentHashMap<String, MyObject>();
public MyObject myGetter(String key) {
MyObject ref = myMap.get(key);
if(ref == null) {
ref = new MyObject(key);
MyOject put = myMap.putIfAbsent(key, ref);
if(put != null) {
// some other thread won put first
ref = put;
}
}
return ref;
}
In this case you are only locking on mutation since CHM holds the property of non-blocking reads.
Otherwise, the edit you supplied is a thread-safe (albeit inefficient) implementation.

I do not want the MyObject to be created if not needed (some other patterns I have found suggest a pattern that may result in multiple creations for the same key).
This is a very common pattern but unfortunately your code suffers race conditions because of it. Yes, it is an example of double-check locking.
You code is doing approximately:
// unsynchronized read of the map
// if value exists then return
// synchronize on map
// add value to map
// leave synchronize block
The problem with this is that the unsynchronized read of a shared map must also be properly memory synchronized. If thread #1 alters the map there is no guarantee that thread #2 would see those updates. Even worse is the fact that you might get partial memory updates which might corrupt the map in thread #2's memory causing infinite loops or other undefined behavior.
As #JohnVint and others have recommended the ConcurrentHashMap is the right thing to do when trying to share a map between multiple threads. It takes care of the memory synchronization (and locking where necessary) for you in the most efficient manner.

How should I implement Guava cache when I plan to cache multiple values efficiently?

I have a Java class that has a Guava LoadingCache<String, Integer> and in that cache, I'm planning to store two things: the average time active employees have worked for the day and their efficiency. I am caching these values because it would be expensive to compute every time a request comes in. Also, the contents of the cache will be refreshed (refreshAfterWrite) every minute.
I was thinking of using a CacheLoader for this situation, however, its load method only loads one value per key. In my CacheLoader, I was planning to do something like:
private Service service = new Service();
public Integer load(String key) throws Exception {
if (key.equals("employeeAvg"))
return calculateEmployeeAvg(service.getAllEmployees());
if (key.equals("employeeEff"))
return calculateEmployeeEff(service.getAllEmployees());
return -1;
}
For me, I find this very inefficient since in order to load both values, I have to invoke service.getAllEmployees() twice because, correct me if I'm wrong, CacheLoader's should be stateless.
Which made me think to use the LoadingCache.put(key, value) method so I can just create a utility method that invokes service.getAllEmployees() once and calculate the values on the fly. However, if I do use LoadingCache.put(), I won't have the refreshAfterWrite feature since it's dependent on a cache loader.
How do I make this more efficient?

It seems like your problem stems from using strings to represent value types (Effective Java Item 50). Instead, consider defining a proper value type that stores this data, and use a memoizing Supplier to avoid recomputing them.
public static class EmployeeStatistics {
private final int average;
private final int efficiency;
// constructor, getters and setters
}
Supplier<EmployeeStatistics> statistics = Suppliers.memoize(
new Supplier<EmployeeStatistics>() {
#Override
public EmployeeStatistics get() {
List<Employee> employees = new Service().getAllEmployees();
return new EmployeeStatistics(
calculateEmployeeAvg(employees),
calculateEmployeeEff(employees));
}});
You could even move these calculation methods inside EmployeeStatistics and simply pass in all employees to the constructor and let it compute the appropriate data.
If you need to configure your caching behavior more than Suppliers.memoize() or Suppliers.memoizeWithExpiration() can provide, consider this similar pattern, which hides the fact that you're using a Cache inside a Supplier:
Supplier<EmployeeStatistics> statistics = new Supplier<EmployeeStatistics>() {
private final Object key = new Object();
private final LoadingCache<Object, EmployeeStatistics> cache =
CacheBuilder.newBuilder()
// configure your builder
.build(
new CacheLoader<Object, EmployeeStatistics>() {
public EmployeeStatistics load(Object key) {
// same behavior as the Supplier above
}});
#Override
public EmployeeStatistics get() {
return cache.get(key);
}};

However, if I do use LoadingCache.put(), I won't have the refreshAfterWrite feature since it's dependent on a cache loader.
I'm not sure, but you might be able to call it from inside the load method. I mean, compute the requested value as you do and put in the other. However, this feels hacky.
If service.getAllEmployees is expensive, then you could cache it. If both calculateEmployeeAvg and calculateEmployeeEff are cheap, then recompute them when needed. Otherwise, it looks like you could use two caches.
I guess, a method computing both values at once could be a reasonable solution. Create a tiny Pair-like class aggregating them and use it as the cache value. There'll be a single key only.
Concerning your own solution, it could be as trivial as
class EmployeeStatsCache {
private long validUntil;
private List<Employee> employeeList;
private Integer employeeAvg;
private Integer employeeEff;
private boolean isValid() {
return System.currentTimeMillis() <= validUntil;
}
private synchronized List<Employee> getEmployeeList() {
if (!isValid || employeeList==null) {
employeeList = service.getAllEmployees();
validUntil = System.currentTimeMillis() + VALIDITY_MILLIS;
}
return employeeList;
}
public synchronized int getEmployeeAvg() {
if (!isValid || employeeAvg==null) {
employeeAvg = calculateEmployeeAvg(getEmployeeList());
}
return employeeAvg;
}
public synchronized int getEmployeeEff() {
if (!isValid || employeeAvg==null) {
employeeAvg = calculateEmployeeEff(getEmployeeList());
}
return employeeAvg;
}
}
Instead of synchronized methods you may want to synchronize on a private final field. There are other possibilities (e.g. Atomic*), but the basic design is probably simpler than adapting Guava's Cache.
Now, I see that there's Suppliers#memoizeWithExpiration in Guava. That's probably even simpler.

What is the best approach to remove from a map when value is greater than x

I am trying to create a Map<String, Date> that can hold values until the date value > 1 day old.
Is the best approach to do this to use a ConcurrentHashMap and then create a thread which iterates the map once every minute or so and then remove the values older than 1 day, or, is there a better approach to doing this?
For clarification the date message received will not be the current time, it can be a time previous
Thanks

EDIT: OK, so I have edited all of the below to use an Optional<Date> because Guava's caches don't like the Callable or CacheLoader to return null and you want to use this as a Map where the value associated with a key may be absent.
Use Guava's Cache
Cache<Key, Graph> graphs = CacheBuilder.newBuilder()
.expireAfterWrite(1, TimeUnit.DAYS)
.build();
Loading Cache would be...
LoadingCache<String, Optional<Date>> graphs = CacheBuilder.newBuilder()
.expireAfterWrite(1, TimeUnit.DAYS)
.build(
new CacheLoader<String, Optional<Date>>() {
public Optional<Date>load(Stringkey) throws AnyException {
return Optional.absent();
}
});
Ok, so I think what you want if the Date may have been in the past is to wrap the above Cache in a ForwardingCache.SimpleForwardingCache. You then overload the get method to return null if the Date value is older than a day.
Cache<String, Optional<Date>> cache = new SimpleForwardingCache<>(grapsh){
public Optional<Date>get(String key, Callable<Optional<Date>> valueLoader){
Optional<Date> result = delegate().get(key, valueLoader);
if (!result.isPresent() || olderThanADay(result.get()))
return Optional.absent();
else
return result;
}
// since you don't really need a valueLoader you might want to add this.
// this is in place of the LoadingCache, if use LoadingCache use
// ForwardingLoadingCache
public Date get(String key){
return get(key,
new Callable<Optional<Date>>(){
public Date call(){return Optional.absent();}
}
}
}

I would probably create a custom class that contains a HashMap, as well as a priority queue of the objects that are also stored in the HashMap. The priority queue is ordered by the system time when the object is too old.
After that, you can have an internal private thread of some kind that sleeps until the next time the first object in the priority queue needs to be removed from the queue and also removed from the HashMap. No need to loop through the HashMap, or check values at a certain rate.
If you don't need to implement it yourself go with a third party library solution like the other answer.

You can sub-class ConcurrentHashMap, or better yet, wrap the Map and remove any entry it finds which is older than a one day. This can be done without an additional thread.
Another option is to add the entries to a priority queue and remove from the oldest entries.

I have thought about this in the past, and have derived a few different solutions. Ultimately, I think that you will want to either implement a Map or extend an existing Map implementation. Using this implementation should get you what you want, you will need to create a thread that will call the evict method.
public class DateKeyedMap<V> extends HashMap<Date, V> {
public void evictOlderThan (Date dateToDelete) {
for (Date date : keySet()) {
if (date.compareTo(dateToDelete) <= 0) {
remove(date);
}
}
}
}
Here is a similar implementation but using the Date as the value, as you are suggesting that you want to map a String to a Date.
public class DateValuedMap<K> extends HashMap<K, Date> {
public void evictOlderThan (Date dateToDelete) {
for (Date date : values()) {
if (date.compareTo(dateToDelete) <= 0) {
remove(date);
}
}
}
}

you can write your own Map like this:
public class DateHashMap implements Map<String, Date>{
..
#Override
public String put(Date key, String value) {
// Check date...
}
..
}
Sincerely,
Max

Should you check if the map containsKey before using ConcurrentMap's putIfAbsent

I have been using Java's ConcurrentMap for a map that can be used from multiple threads. The putIfAbsent is a great method and is much easier to read/write than using standard map operations. I have some code that looks like this:
ConcurrentMap<String, Set<X>> map = new ConcurrentHashMap<String, Set<X>>();
// ...
map.putIfAbsent(name, new HashSet<X>());
map.get(name).add(Y);
Readability wise this is great but it does require creating a new HashSet every time even if it is already in the map. I could write this:
if (!map.containsKey(name)) {
map.putIfAbsent(name, new HashSet<X>());
}
map.get(name).add(Y);
With this change it loses a bit of readability but does not need to create the HashSet every time. Which is better in this case? I tend to side with the first one since it is more readable. The second would perform better and may be more correct. Maybe there is a better way to do this than either of these.
What is the best practice for using a putIfAbsent in this manner?

Concurrency is hard. If you are going to bother with concurrent maps instead of straightforward locking, you might as well go for it. Indeed, don't do lookups more than necessary.
Set<X> set = map.get(name);
if (set == null) {
final Set<X> value = new HashSet<X>();
set = map.putIfAbsent(name, value);
if (set == null) {
set = value;
}
}
(Usual stackoverflow disclaimer: Off the top of my head. Not tested. Not compiled. Etc.)
Update: 1.8 has added computeIfAbsent default method to ConcurrentMap (and Map which is kind of interesting because that implementation would be wrong for ConcurrentMap). (And 1.7 added the "diamond operator" <>.)
Set<X> set = map.computeIfAbsent(name, n -> new HashSet<>());
(Note, you are responsible for the thread-safety of any operations of the HashSets contained in the ConcurrentMap.)

Tom's answer is correct as far as API usage goes for ConcurrentMap. An alternative that avoids using putIfAbsent is to use the computing map from the GoogleCollections/Guava MapMaker which auto-populates the values with a supplied function and handles all the thread-safety for you. It actually only creates one value per key and if the create function is expensive, other threads asking getting the same key will block until the value becomes available.
Edit from Guava 11, MapMaker is deprecated and being replaced with the Cache/LocalCache/CacheBuilder stuff. This is a little more complicated in its usage but basically isomorphic.

You can use MutableMap.getIfAbsentPut(K, Function0<? extends V>) from Eclipse Collections (formerly GS Collections).
The advantage over calling get(), doing a null check, and then calling putIfAbsent() is that we'll only compute the key's hashCode once, and find the right spot in the hashtable once. In ConcurrentMaps like org.eclipse.collections.impl.map.mutable.ConcurrentHashMap, the implementation of getIfAbsentPut() is also thread-safe and atomic.
import org.eclipse.collections.impl.map.mutable.ConcurrentHashMap;
...
ConcurrentHashMap<String, MyObject> map = new ConcurrentHashMap<>();
map.getIfAbsentPut("key", () -> someExpensiveComputation());
The implementation of org.eclipse.collections.impl.map.mutable.ConcurrentHashMap is truly non-blocking. While every effort is made not to call the factory function unnecessarily, there's still a chance it will be called more than once during contention.
This fact sets it apart from Java 8's ConcurrentHashMap.computeIfAbsent(K, Function<? super K,? extends V>). The Javadoc for this method states:
The entire method invocation is performed atomically, so the function
is applied at most once per key. Some attempted update operations on
this map by other threads may be blocked while computation is in
progress, so the computation should be short and simple...
Note: I am a committer for Eclipse Collections.

By keeping a pre-initialized value for each thread you can improve on the accepted answer:
Set<X> initial = new HashSet<X>();
...
Set<X> set = map.putIfAbsent(name, initial);
if (set == null) {
set = initial;
initial = new HashSet<X>();
}
set.add(Y);
I recently used this with AtomicInteger map values rather than Set.

In 5+ years, I can't believe no one has mentioned or posted a solution that uses ThreadLocal to solve this problem; and several of the solutions on this page are not threadsafe and are just sloppy.
Using ThreadLocals for this specific problem isn't only considered best practices for concurrency, but for minimizing garbage/object creation during thread contention. Also, it's incredibly clean code.
For example:
private final ThreadLocal<HashSet<X>>
threadCache = new ThreadLocal<HashSet<X>>() {
#Override
protected
HashSet<X> initialValue() {
return new HashSet<X>();
}
};
private final ConcurrentMap<String, Set<X>>
map = new ConcurrentHashMap<String, Set<X>>();
And the actual logic...
// minimize object creation during thread contention
final Set<X> cached = threadCache.get();
Set<X> data = map.putIfAbsent("foo", cached);
if (data == null) {
// reset the cached value in the ThreadLocal
listCache.set(new HashSet<X>());
data = cached;
}
// make sure that the access to the set is thread safe
synchronized(data) {
data.add(object);
}

My generic approximation:
public class ConcurrentHashMapWithInit<K, V> extends ConcurrentHashMap<K, V> {
private static final long serialVersionUID = 42L;
public V initIfAbsent(final K key) {
V value = get(key);
if (value == null) {
value = initialValue();
final V x = putIfAbsent(key, value);
value = (x != null) ? x : value;
}
return value;
}
protected V initialValue() {
return null;
}
}
And as example of use:
public static void main(final String[] args) throws Throwable {
ConcurrentHashMapWithInit<String, HashSet<String>> map =
new ConcurrentHashMapWithInit<String, HashSet<String>>() {
private static final long serialVersionUID = 42L;
#Override
protected HashSet<String> initialValue() {
return new HashSet<String>();
}
};
map.initIfAbsent("s1").add("chao");
map.initIfAbsent("s2").add("bye");
System.out.println(map.toString());
}

Easy, simple to use LRU cache in java

I know it's simple to implement, but I want to reuse something that already exist.
Problem I want to solve is that I load configuration (from XML so I want to cache them) for different pages, roles, ... so the combination of inputs can grow quite much (but in 99% will not). To handle this 1%, I want to have some max number of items in cache...
Till know I have found org.apache.commons.collections.map.LRUMap in apache commons and it looks fine but want to check also something else. Any recommendations?

You can use a LinkedHashMap (Java 1.4+) :
// Create cache
final int MAX_ENTRIES = 100;
Map cache = new LinkedHashMap(MAX_ENTRIES+1, .75F, true) {
// This method is called just after a new entry has been added
public boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
};
// Add to cache
Object key = "key";
cache.put(key, object);
// Get object
Object o = cache.get(key);
if (o == null && !cache.containsKey(key)) {
// Object not in cache. If null is not a possible value in the cache,
// the call to cache.contains(key) is not needed
}
// If the cache is to be used by multiple threads,
// the cache must be wrapped with code to synchronize the methods
cache = (Map)Collections.synchronizedMap(cache);

This is an old question, but for posterity I wanted to list ConcurrentLinkedHashMap, which is thread safe, unlike LRUMap. Usage is quite easy:
ConcurrentMap<K, V> cache = new ConcurrentLinkedHashMap.Builder<K, V>()
.maximumWeightedCapacity(1000)
.build();
And the documentation has some good examples, like how to make the LRU cache size-based instead of number-of-items based.

Here is my implementation which lets me keep an optimal number of elements in memory.
The point is that I do not need to keep track of what objects are currently being used since I'm using a combination of a LinkedHashMap for the MRU objects and a WeakHashMap for the LRU objects.
So the cache capacity is no less than MRU size plus whatever the GC lets me keep. Whenever objects fall off the MRU they go to the LRU for as long as the GC will have them.
public class Cache<K,V> {
final Map<K,V> MRUdata;
final Map<K,V> LRUdata;
public Cache(final int capacity)
{
LRUdata = new WeakHashMap<K, V>();
MRUdata = new LinkedHashMap<K, V>(capacity+1, 1.0f, true) {
protected boolean removeEldestEntry(Map.Entry<K,V> entry)
{
if (this.size() > capacity) {
LRUdata.put(entry.getKey(), entry.getValue());
return true;
}
return false;
};
};
}
public synchronized V tryGet(K key)
{
V value = MRUdata.get(key);
if (value!=null)
return value;
value = LRUdata.get(key);
if (value!=null) {
LRUdata.remove(key);
MRUdata.put(key, value);
}
return value;
}
public synchronized void set(K key, V value)
{
LRUdata.remove(key);
MRUdata.put(key, value);
}
}

I also had same problem and I haven't found any good libraries... so I've created my own.
simplelrucache provides threadsafe, very simple, non-distributed LRU caching with TTL support. It provides two implementations
Concurrent based on ConcurrentLinkedHashMap
Synchronized based on LinkedHashMap
You can find it here.

Here is a very simple and easy to use LRU cache in Java.
Although it is short and simple it is production quality.
The code is explained (look at the README.md) and has some unit tests.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Lambdas and putIfAbsent - java

Related

Is this a case of the 'double check' anti-pattern? (Get a map entry and create it if it does not exist)

How should I implement Guava cache when I plan to cache multiple values efficiently?

What is the best approach to remove from a map when value is greater than x

Should you check if the map containsKey before using ConcurrentMap's putIfAbsent

Easy, simple to use LRU cache in java

Categories

Resources