My requirement is to find out all the matching objects with given criteria.
So what I did was created a Custom predicate and passed the matching criteria and the source IDs that should be matched with the Objects. If the condition is matching, then I am internally populating a map to hold the details.
And when the stream is completed, this map will have all the matching criteria details.
So I use this map for later computation.
Most of the time it works, but starts failing randomly.
My Custom predicate is -
public class IdentifierMapPredicate<T extends Identifiable, V> implements Predicate<T> {
private static Logger logger = Logger
.getLogger();
private Collection<V> sourceIds;
private BiPredicate<T, V> matchingCondition;
private Map<V, Collection<Identifier>> identifierMap;
public Map<V, Collection<Identifier>> getIdentifierMap() {
return identifierMap;
}
public IdentifierMapPredicate(Collection<V> sourceIds,
BiPredicate<T, V> matchingCondition) {
this.sourceIds = sourceIds;
this.matchingCondition = matchingCondition;
identifierMap = new HashMap<>(sourceIds.size());
}
#Override
public boolean test(T t) {
for (V id : sourceIds) {
if (matchingCondition.test(t, id)) {
//logger.debug("t :{}, id :{}", t.getIdentifier(), id);
Collection<Identifier> ids = identifierMap.get(id);
if(ids == null){
ids = new HashSet<>();
identifierMap.put(id, ids);
}
ids.add(t.getIdentifier());
logger.debug("identifierMap :{}", identifierMap);
return true;
}
}
return false;
}
}
I added log statements when the criteria is matched and we update the map, sometime the map has only 1 element (even though I am expecting 2) or sometimes its 0.
2019-11-24 09:31:52.574 PST DEBUG ForkJoinPool.commonPool-worker-5 IdentifierMapPredicate:52 - identifierMap :{}
2019-11-24 09:31:52.574 PST DEBUG ForkJoinPool.commonPool-worker-7 IdentifierMapPredicate:52 -identifierMap :{}
Is there anything wrong with the above implementation?
Should changing it to ConcurrentHashMap will help?
Most of the time it works, but starts failing randomly
Yeah, classical race condition.
You're using IdentifierMapPredicate in a multi-threaded context. If you access IdentifierMapPredicate.test concurrently, threads will use also identifierMap simulaneously, which is not thread safe.
It looks like a ConcurrentHashMap will solve the issue. Alternatively you could modify method test with the synchronized keyword, but that will give you less throughput/more locking I'd assume. But on the other hand also less headache, its a trade-off where I'd gladly accept less headache if there aren't tight performance requirements. Just my personal preference.
Btw. nowadays you can use Map.computeIfAbsent to create new entries instead of
Collection<Identifier> ids = identifierMap.get(id);
if(ids == null){
ids = new HashSet<>();
identifierMap.put(id, ids);
}
Related
I have a Java class that has a Guava LoadingCache<String, Integer> and in that cache, I'm planning to store two things: the average time active employees have worked for the day and their efficiency. I am caching these values because it would be expensive to compute every time a request comes in. Also, the contents of the cache will be refreshed (refreshAfterWrite) every minute.
I was thinking of using a CacheLoader for this situation, however, its load method only loads one value per key. In my CacheLoader, I was planning to do something like:
private Service service = new Service();
public Integer load(String key) throws Exception {
if (key.equals("employeeAvg"))
return calculateEmployeeAvg(service.getAllEmployees());
if (key.equals("employeeEff"))
return calculateEmployeeEff(service.getAllEmployees());
return -1;
}
For me, I find this very inefficient since in order to load both values, I have to invoke service.getAllEmployees() twice because, correct me if I'm wrong, CacheLoader's should be stateless.
Which made me think to use the LoadingCache.put(key, value) method so I can just create a utility method that invokes service.getAllEmployees() once and calculate the values on the fly. However, if I do use LoadingCache.put(), I won't have the refreshAfterWrite feature since it's dependent on a cache loader.
How do I make this more efficient?
It seems like your problem stems from using strings to represent value types (Effective Java Item 50). Instead, consider defining a proper value type that stores this data, and use a memoizing Supplier to avoid recomputing them.
public static class EmployeeStatistics {
private final int average;
private final int efficiency;
// constructor, getters and setters
}
Supplier<EmployeeStatistics> statistics = Suppliers.memoize(
new Supplier<EmployeeStatistics>() {
#Override
public EmployeeStatistics get() {
List<Employee> employees = new Service().getAllEmployees();
return new EmployeeStatistics(
calculateEmployeeAvg(employees),
calculateEmployeeEff(employees));
}});
You could even move these calculation methods inside EmployeeStatistics and simply pass in all employees to the constructor and let it compute the appropriate data.
If you need to configure your caching behavior more than Suppliers.memoize() or Suppliers.memoizeWithExpiration() can provide, consider this similar pattern, which hides the fact that you're using a Cache inside a Supplier:
Supplier<EmployeeStatistics> statistics = new Supplier<EmployeeStatistics>() {
private final Object key = new Object();
private final LoadingCache<Object, EmployeeStatistics> cache =
CacheBuilder.newBuilder()
// configure your builder
.build(
new CacheLoader<Object, EmployeeStatistics>() {
public EmployeeStatistics load(Object key) {
// same behavior as the Supplier above
}});
#Override
public EmployeeStatistics get() {
return cache.get(key);
}};
However, if I do use LoadingCache.put(), I won't have the refreshAfterWrite feature since it's dependent on a cache loader.
I'm not sure, but you might be able to call it from inside the load method. I mean, compute the requested value as you do and put in the other. However, this feels hacky.
If service.getAllEmployees is expensive, then you could cache it. If both calculateEmployeeAvg and calculateEmployeeEff are cheap, then recompute them when needed. Otherwise, it looks like you could use two caches.
I guess, a method computing both values at once could be a reasonable solution. Create a tiny Pair-like class aggregating them and use it as the cache value. There'll be a single key only.
Concerning your own solution, it could be as trivial as
class EmployeeStatsCache {
private long validUntil;
private List<Employee> employeeList;
private Integer employeeAvg;
private Integer employeeEff;
private boolean isValid() {
return System.currentTimeMillis() <= validUntil;
}
private synchronized List<Employee> getEmployeeList() {
if (!isValid || employeeList==null) {
employeeList = service.getAllEmployees();
validUntil = System.currentTimeMillis() + VALIDITY_MILLIS;
}
return employeeList;
}
public synchronized int getEmployeeAvg() {
if (!isValid || employeeAvg==null) {
employeeAvg = calculateEmployeeAvg(getEmployeeList());
}
return employeeAvg;
}
public synchronized int getEmployeeEff() {
if (!isValid || employeeAvg==null) {
employeeAvg = calculateEmployeeEff(getEmployeeList());
}
return employeeAvg;
}
}
Instead of synchronized methods you may want to synchronize on a private final field. There are other possibilities (e.g. Atomic*), but the basic design is probably simpler than adapting Guava's Cache.
Now, I see that there's Suppliers#memoizeWithExpiration in Guava. That's probably even simpler.
I am trying to create a Map<String, Date> that can hold values until the date value > 1 day old.
Is the best approach to do this to use a ConcurrentHashMap and then create a thread which iterates the map once every minute or so and then remove the values older than 1 day, or, is there a better approach to doing this?
For clarification the date message received will not be the current time, it can be a time previous
Thanks
EDIT: OK, so I have edited all of the below to use an Optional<Date> because Guava's caches don't like the Callable or CacheLoader to return null and you want to use this as a Map where the value associated with a key may be absent.
Use Guava's Cache
Cache<Key, Graph> graphs = CacheBuilder.newBuilder()
.expireAfterWrite(1, TimeUnit.DAYS)
.build();
Loading Cache would be...
LoadingCache<String, Optional<Date>> graphs = CacheBuilder.newBuilder()
.expireAfterWrite(1, TimeUnit.DAYS)
.build(
new CacheLoader<String, Optional<Date>>() {
public Optional<Date>load(Stringkey) throws AnyException {
return Optional.absent();
}
});
Ok, so I think what you want if the Date may have been in the past is to wrap the above Cache in a ForwardingCache.SimpleForwardingCache. You then overload the get method to return null if the Date value is older than a day.
Cache<String, Optional<Date>> cache = new SimpleForwardingCache<>(grapsh){
public Optional<Date>get(String key, Callable<Optional<Date>> valueLoader){
Optional<Date> result = delegate().get(key, valueLoader);
if (!result.isPresent() || olderThanADay(result.get()))
return Optional.absent();
else
return result;
}
// since you don't really need a valueLoader you might want to add this.
// this is in place of the LoadingCache, if use LoadingCache use
// ForwardingLoadingCache
public Date get(String key){
return get(key,
new Callable<Optional<Date>>(){
public Date call(){return Optional.absent();}
}
}
}
I would probably create a custom class that contains a HashMap, as well as a priority queue of the objects that are also stored in the HashMap. The priority queue is ordered by the system time when the object is too old.
After that, you can have an internal private thread of some kind that sleeps until the next time the first object in the priority queue needs to be removed from the queue and also removed from the HashMap. No need to loop through the HashMap, or check values at a certain rate.
If you don't need to implement it yourself go with a third party library solution like the other answer.
You can sub-class ConcurrentHashMap, or better yet, wrap the Map and remove any entry it finds which is older than a one day. This can be done without an additional thread.
Another option is to add the entries to a priority queue and remove from the oldest entries.
I have thought about this in the past, and have derived a few different solutions. Ultimately, I think that you will want to either implement a Map or extend an existing Map implementation. Using this implementation should get you what you want, you will need to create a thread that will call the evict method.
public class DateKeyedMap<V> extends HashMap<Date, V> {
public void evictOlderThan (Date dateToDelete) {
for (Date date : keySet()) {
if (date.compareTo(dateToDelete) <= 0) {
remove(date);
}
}
}
}
Here is a similar implementation but using the Date as the value, as you are suggesting that you want to map a String to a Date.
public class DateValuedMap<K> extends HashMap<K, Date> {
public void evictOlderThan (Date dateToDelete) {
for (Date date : values()) {
if (date.compareTo(dateToDelete) <= 0) {
remove(date);
}
}
}
}
you can write your own Map like this:
public class DateHashMap implements Map<String, Date>{
..
#Override
public String put(Date key, String value) {
// Check date...
}
..
}
Sincerely,
Max
I have been using Java's ConcurrentMap for a map that can be used from multiple threads. The putIfAbsent is a great method and is much easier to read/write than using standard map operations. I have some code that looks like this:
ConcurrentMap<String, Set<X>> map = new ConcurrentHashMap<String, Set<X>>();
// ...
map.putIfAbsent(name, new HashSet<X>());
map.get(name).add(Y);
Readability wise this is great but it does require creating a new HashSet every time even if it is already in the map. I could write this:
if (!map.containsKey(name)) {
map.putIfAbsent(name, new HashSet<X>());
}
map.get(name).add(Y);
With this change it loses a bit of readability but does not need to create the HashSet every time. Which is better in this case? I tend to side with the first one since it is more readable. The second would perform better and may be more correct. Maybe there is a better way to do this than either of these.
What is the best practice for using a putIfAbsent in this manner?
Concurrency is hard. If you are going to bother with concurrent maps instead of straightforward locking, you might as well go for it. Indeed, don't do lookups more than necessary.
Set<X> set = map.get(name);
if (set == null) {
final Set<X> value = new HashSet<X>();
set = map.putIfAbsent(name, value);
if (set == null) {
set = value;
}
}
(Usual stackoverflow disclaimer: Off the top of my head. Not tested. Not compiled. Etc.)
Update: 1.8 has added computeIfAbsent default method to ConcurrentMap (and Map which is kind of interesting because that implementation would be wrong for ConcurrentMap). (And 1.7 added the "diamond operator" <>.)
Set<X> set = map.computeIfAbsent(name, n -> new HashSet<>());
(Note, you are responsible for the thread-safety of any operations of the HashSets contained in the ConcurrentMap.)
Tom's answer is correct as far as API usage goes for ConcurrentMap. An alternative that avoids using putIfAbsent is to use the computing map from the GoogleCollections/Guava MapMaker which auto-populates the values with a supplied function and handles all the thread-safety for you. It actually only creates one value per key and if the create function is expensive, other threads asking getting the same key will block until the value becomes available.
Edit from Guava 11, MapMaker is deprecated and being replaced with the Cache/LocalCache/CacheBuilder stuff. This is a little more complicated in its usage but basically isomorphic.
You can use MutableMap.getIfAbsentPut(K, Function0<? extends V>) from Eclipse Collections (formerly GS Collections).
The advantage over calling get(), doing a null check, and then calling putIfAbsent() is that we'll only compute the key's hashCode once, and find the right spot in the hashtable once. In ConcurrentMaps like org.eclipse.collections.impl.map.mutable.ConcurrentHashMap, the implementation of getIfAbsentPut() is also thread-safe and atomic.
import org.eclipse.collections.impl.map.mutable.ConcurrentHashMap;
...
ConcurrentHashMap<String, MyObject> map = new ConcurrentHashMap<>();
map.getIfAbsentPut("key", () -> someExpensiveComputation());
The implementation of org.eclipse.collections.impl.map.mutable.ConcurrentHashMap is truly non-blocking. While every effort is made not to call the factory function unnecessarily, there's still a chance it will be called more than once during contention.
This fact sets it apart from Java 8's ConcurrentHashMap.computeIfAbsent(K, Function<? super K,? extends V>). The Javadoc for this method states:
The entire method invocation is performed atomically, so the function
is applied at most once per key. Some attempted update operations on
this map by other threads may be blocked while computation is in
progress, so the computation should be short and simple...
Note: I am a committer for Eclipse Collections.
By keeping a pre-initialized value for each thread you can improve on the accepted answer:
Set<X> initial = new HashSet<X>();
...
Set<X> set = map.putIfAbsent(name, initial);
if (set == null) {
set = initial;
initial = new HashSet<X>();
}
set.add(Y);
I recently used this with AtomicInteger map values rather than Set.
In 5+ years, I can't believe no one has mentioned or posted a solution that uses ThreadLocal to solve this problem; and several of the solutions on this page are not threadsafe and are just sloppy.
Using ThreadLocals for this specific problem isn't only considered best practices for concurrency, but for minimizing garbage/object creation during thread contention. Also, it's incredibly clean code.
For example:
private final ThreadLocal<HashSet<X>>
threadCache = new ThreadLocal<HashSet<X>>() {
#Override
protected
HashSet<X> initialValue() {
return new HashSet<X>();
}
};
private final ConcurrentMap<String, Set<X>>
map = new ConcurrentHashMap<String, Set<X>>();
And the actual logic...
// minimize object creation during thread contention
final Set<X> cached = threadCache.get();
Set<X> data = map.putIfAbsent("foo", cached);
if (data == null) {
// reset the cached value in the ThreadLocal
listCache.set(new HashSet<X>());
data = cached;
}
// make sure that the access to the set is thread safe
synchronized(data) {
data.add(object);
}
My generic approximation:
public class ConcurrentHashMapWithInit<K, V> extends ConcurrentHashMap<K, V> {
private static final long serialVersionUID = 42L;
public V initIfAbsent(final K key) {
V value = get(key);
if (value == null) {
value = initialValue();
final V x = putIfAbsent(key, value);
value = (x != null) ? x : value;
}
return value;
}
protected V initialValue() {
return null;
}
}
And as example of use:
public static void main(final String[] args) throws Throwable {
ConcurrentHashMapWithInit<String, HashSet<String>> map =
new ConcurrentHashMapWithInit<String, HashSet<String>>() {
private static final long serialVersionUID = 42L;
#Override
protected HashSet<String> initialValue() {
return new HashSet<String>();
}
};
map.initIfAbsent("s1").add("chao");
map.initIfAbsent("s2").add("bye");
System.out.println(map.toString());
}
I have program consisting of a number of classes. I have a problem with the interraction of two of the classes - WebDataCache and Client. The problem classes are listed below.
WebData:
This is just a data class representing some data retrieved from the internet.
WebService:
This is just a web service wrapper class which connects to a particular web service, reads some data and stores it in an object of type WebData.
WebDataCache:
This is a class which uses the WebService class to retreive data that's cached in a map, keyed by the ids fields of the data.
Client:
This is is a class which contains a refrence to an instance of the WebDataCache class and uses the cached data.
The problem is (as illustrated below) when the class is looping through the cached data, it is possible for the WebDataCache to update the underlying collection.
My question is how do I synchronize access to the cache?
I don't want to synchronize the whole cache as there are multiple instance of the Client class, however each instantiated with a unique id (i.e. new Client(0,...), new Client(1,...), new Client(2,...), etc each instance only interested in data keyed by the id the client was instansiated with.
Are there any relevent design patterns I can use?
class WebData {
private final int id;
private final long id2;
public WebData(int id, long id2) {
this.id = id;
this.id2 = id2;
}
public int getId() { return this.id; }
public long getId2() { return this.id2; }
}
class WebService {
Collection<WebData> getData(int id) {
Collection<WebData> a = new ArrayList<WebData>();
// populate A with data from a webservice
return a;
}
}
class WebDataCache implements Runnable {
private Map<Integer, Map<Long, WebData>> cache =
new HashMap<Integer, Map<Long, WebData>>();
private Collection<Integer> requests =
new ArrayList<Integer>();
#Override
public void run() {
WebService webSvc = new WebService();
// get data from some web service
while(true) {
for (int id : requests) {
Collection<WebData> webData = webSvc.getData(id);
Map<Long, WebData> row = cache.get(id);
if (row == null)
row = cache.put(id, new HashMap<Long, WebData>());
else
row.clear();
for (WebData webDataItem : webData) {
row.put(webDataItem.getId2(), webDataItem);
}
}
Thread.sleep(2000);
}
}
public synchronized Collection<WebData> getData(int id){
return cache.get(id).values();
}
public synchronized void requestData(int id) {
requests.add(id);
}
}
-
class Client implements Runnable {
private final WebDataCache cache;
private final int id;
public Client(int id, WebDataCache cache){
this.id = id;
this.cache = cache;
}
#Override
public void run() {
cache.requestData(id);
while (true) {
for (WebData item : cache.getData(id)) {
// java.util.ConcurrentModificationException is thrown here...
// I understand that the collection is probably being modified in WebDataCache::run()
// my question what's the best way to sychronize this code snippet?
}
}
}
}
Thanks!
Use java.util.concurrent.ConcurrentHashMap instead of plain old java.util.HashMap. From the Javadoc:
A hash table supporting full
concurrency of retrievals and
adjustable expected concurrency for
updates. This class obeys the same
functional specification as Hashtable,
and includes versions of methods
corresponding to each method of
Hashtable. However, even though all
operations are thread-safe, retrieval
operations do not entail locking, and
there is not any support for locking
the entire table in a way that
prevents all access. This class is
fully interoperable with Hashtable in
programs that rely on its thread
safety but not on its synchronization
details.
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ConcurrentHashMap.html
So you would replace:
private Map<Integer, Map<Long, WebData>> cache =
new HashMap<Integer, Map<Long, WebData>>();
With
private Map<Integer, Map<Long, WebData>> cache =
new ConcurrentHashMap<Integer, Map<Long, WebData>>();
My best recommendation is to use an existing cache implementation such as JCS or EhCache - these are battle tested implementations.
Otherwise, you have a couple of things going on in your code. Things that can break in funny ways.
HashMap can grow infinite loops when modified concurrently by multiple threads. So don't. Use java.util.concurrent.ConcurrentHashMap instead.
The ArrayList that you use for WebDataCache.requests isn't thread-safe either and you have inconsistent synchronization - either change it to a safer list implementation from java.util.concurrent or make sure that all access to it is synchronizing on the same lock.
Lastly, have your code checked with FindBugs and/or properly reviewed by someone with solid knowledge and experience on writing multi-threaded code.
If you want to read a book on this stuff, I can recommend Java Concurrency in Practice by Brian Goetz.
In addition to the other posted recommendations, consider how often the cache is updated versus just being read. If the reading dominates and updating is rare, and it's not critical that the reading loop be able to see every update immediately, consider using a CopyOnWriteArraySet. It and its sibling CopyOnWriteArrayList allow concurrent reading and updating of the members; the reader sees a consistent snapshot unaffected by any mutation of the underlying collection -- analogous to the SERIALIZABLE isolation level in a relational database.
The problem here, though, is that neither of these two structures give you your dictionary or associative array storage (a la Map) out of the box. You'd have to define a composite structure to store the key and value together, and, given that CopyOnWriteArraySet uses Object#equals() for membership testing, you'd have to write an unconventional key-based equals() method for your structure.
The answer from LES2 is good except that you would have to replace:
row = cache.put(id, new HashMap<Long, WebData>());
with:
row = cache.put(id, new ConcurrentHashMap<Long, WebData>());
For that's the one that hold the "problematic" collection and not the whole cache.
You can synchronize on the row returned by the cache who is at the end who holds the collection that is being shared.
On WebDataCache:
Map<Long, WebData> row = cache.get(id);
if (row == null) {
row = cache.put(id, new HashMap<Long, WebData>());
} else synchronized( row ) {
row.clear();
}
for (WebData webDataItem : webData) synchronized( row ) {
row.put(webDataItem.getId2(), webDataItem);
}
// it doesn't make sense to synchronize the whole cache here.
public Collection<WebData> getData(int id){
return cache.get(id).values();
}
On Client:
Collection<WebData> data = cache.getData(id);
synchronized( data ) {
for (WebData item : cache.getData(id)) {
}
}
Of course this is far from perfect it just answer the question of what to synchronize. In this case it would be the access to the underlaying collection in. row.clear row.put on the cache and the iteration on the client.
BTW, why do you have a Map in the cache, and you use a collection in the client. You should use the same structure on both and don't expose the underlying implementation.
I know it's simple to implement, but I want to reuse something that already exist.
Problem I want to solve is that I load configuration (from XML so I want to cache them) for different pages, roles, ... so the combination of inputs can grow quite much (but in 99% will not). To handle this 1%, I want to have some max number of items in cache...
Till know I have found org.apache.commons.collections.map.LRUMap in apache commons and it looks fine but want to check also something else. Any recommendations?
You can use a LinkedHashMap (Java 1.4+) :
// Create cache
final int MAX_ENTRIES = 100;
Map cache = new LinkedHashMap(MAX_ENTRIES+1, .75F, true) {
// This method is called just after a new entry has been added
public boolean removeEldestEntry(Map.Entry eldest) {
return size() > MAX_ENTRIES;
}
};
// Add to cache
Object key = "key";
cache.put(key, object);
// Get object
Object o = cache.get(key);
if (o == null && !cache.containsKey(key)) {
// Object not in cache. If null is not a possible value in the cache,
// the call to cache.contains(key) is not needed
}
// If the cache is to be used by multiple threads,
// the cache must be wrapped with code to synchronize the methods
cache = (Map)Collections.synchronizedMap(cache);
This is an old question, but for posterity I wanted to list ConcurrentLinkedHashMap, which is thread safe, unlike LRUMap. Usage is quite easy:
ConcurrentMap<K, V> cache = new ConcurrentLinkedHashMap.Builder<K, V>()
.maximumWeightedCapacity(1000)
.build();
And the documentation has some good examples, like how to make the LRU cache size-based instead of number-of-items based.
Here is my implementation which lets me keep an optimal number of elements in memory.
The point is that I do not need to keep track of what objects are currently being used since I'm using a combination of a LinkedHashMap for the MRU objects and a WeakHashMap for the LRU objects.
So the cache capacity is no less than MRU size plus whatever the GC lets me keep. Whenever objects fall off the MRU they go to the LRU for as long as the GC will have them.
public class Cache<K,V> {
final Map<K,V> MRUdata;
final Map<K,V> LRUdata;
public Cache(final int capacity)
{
LRUdata = new WeakHashMap<K, V>();
MRUdata = new LinkedHashMap<K, V>(capacity+1, 1.0f, true) {
protected boolean removeEldestEntry(Map.Entry<K,V> entry)
{
if (this.size() > capacity) {
LRUdata.put(entry.getKey(), entry.getValue());
return true;
}
return false;
};
};
}
public synchronized V tryGet(K key)
{
V value = MRUdata.get(key);
if (value!=null)
return value;
value = LRUdata.get(key);
if (value!=null) {
LRUdata.remove(key);
MRUdata.put(key, value);
}
return value;
}
public synchronized void set(K key, V value)
{
LRUdata.remove(key);
MRUdata.put(key, value);
}
}
I also had same problem and I haven't found any good libraries... so I've created my own.
simplelrucache provides threadsafe, very simple, non-distributed LRU caching with TTL support. It provides two implementations
Concurrent based on ConcurrentLinkedHashMap
Synchronized based on LinkedHashMap
You can find it here.
Here is a very simple and easy to use LRU cache in Java.
Although it is short and simple it is production quality.
The code is explained (look at the README.md) and has some unit tests.