ConcurrentSkipListSet and replace remove(key)

ConcurrentSkipListSet and replace remove(key) - java

I am using ConcurrentSkipListSet, which I fill with 20 keys.
I want to replace these keys continuously. However, ConcurrentSkipListSet doesn't seem to have an atomic replace function.
This is what I am using now:
ConcurrentSkipListSet<Long> set = new ConcurrentSkipListSet<Long>();
AtomicLong uniquefier = new AtomicLong(1);
public void fillSet() {
// fills set with 20 unique keys;
}
public void updateSet() {
Long now = Calendar.getInstance().getTimeInMillis();
Long oldestKey = set.first();
if (set.remove(oldestKey)) {
set.add(makeUnique(now));
}
}
private static final long MULTIPLIER = 1024;
public Long makeUnique(long in) {
return (in*MULTIPLIER+uniquefier.getAndSet((uniquefier.incrementAndGet())%(MULTIPLIER/2)));
}
The goal of this whole operation is to keep the list as long as it is, and only update by replacing. updateSet is called some 100 times per ms.
Now, my question is this: does remove return true if the element itself was present before (and isn't after), or does the method return true only if the call was actually responsible for the removal?
I.e.: if multiple threads call remove on the very same key at the very same time, will they /all/ return true, or will only one return true?

set.remove will only return true for the thread that actually caused the object to be removed.
The idea behind the set's concurrency is that multiple threads can be updating multiple objects. However, each individual object can only be updated by one thread at a time.

Related

ConcurrentHashMap removal of keys

I'm trying to figure out how to write a thread-safe, expiring entries, cache. This cache will be used as a no-hits cache, so that if an entry is not found in some storage, I will put it in this cache and avoid the subsequent calls in the next minutes.
There will be multiple threads reading and writing this cache.
There will be just a single ThreadSafeCache instance in my application.
I'm not sure if removing an entry in the contains method will arise synchronization issues.
How may I test this class for thread-safety?
Kind regards
public class ThreadSafeCache
{
private final Clock clock = Clock.systemUTC();
private final Duration expiration = Duration.ofMinutes(10);
private final ConcurrentHashMap<CacheKey, CacheValue> internalMap = new ConcurrentHashMap<>();
public boolean contains(String a, String b, byte[] c, String d)
{
CacheKey key = new CacheKey(a, b, c, d);
CacheValue value = internalMap.get(key);
if (value == null || value.isExpired())
{
internalMap.remove(key);
return false;
}
return true;
}
public void put(String a, String b, byte[] c, String d)
{
internalMap.computeIfAbsent(new CacheKey(a, b, c, d), key -> new CacheValue());
}
private class CacheValue
{
private final Instant insertionDate;
private CacheValue()
{
this.insertionDate = clock.instant();
}
boolean isExpired()
{
return Duration.between(insertionDate,
clock.instant()).compareTo(expiration) > 0;
}
}
}

You are calling 2 Map operations in the same function, meaning there is scope for interleaving (i.e. another operation happens in between the 2 operations in the function, changing its behaviour). To fix this, you can put the map operations in a synchronized (internalMap) {} block. Note, you must do this to any method that interacts with the map in 2 discrete method calls.
From a code-style point of view, it is bad practice to modify the map in the contains method. This will make your code less predictable. Another person coming to your code for the first time (or you in a few months time) may not remember that contains() actually modifies the cache. contains implies that it simply checks the cache, rather than modifying it.
My recommendation would be:
If the key has expired, simply return false.
In the get() method, check if the value has expired, and, if it has, compute a new one there.

Your question: "I'm not sure if removing an entry in the contains method will arise synchronization issues".
=> No problem with remove operation because you use synchronized collections ConcurrentHashMap, it's the best choice.
extra: another way to get a synchronized collections is: Collections.synchonize(myCollection), but it's not OK if we use remove operation in multithread envi (maybe in a loop), it throws ModificationException.
So, use synchronized collections (ex: ConcurrentHashMap) is alaways recommended

Refreshing cache without impacting latency to access the cache

I have a cache refresh logic and want to make sure that it's thread-safe and correct way to do it.
public class Test {
Set<Integer> cache = Sets.newConcurrentHashSet();
public boolean contain(int num) {
return cache.contains(num);
}
public void refresh() {
cache.clear();
cache.addAll(getNums());
}
}
So I have a background thread refreshing cache - periodically call refresh. And multiple threads are calling contain at the same time. I was trying to avoid having synchronized in the methods signature because refresh could take some time (imagine that getNum makes network calls and parsing huge data) then contain would be blocked.
I think this code is not good enough because if contain called in between clear and addAll then contain always returns false.
What is the best way to achieve cache refreshing without impacting significant latency to contain call?

Best way would be to use functional programming paradigm whereby you have immutable state (in this case a Set), instead of adding and removing elements to that set you create an entirely new Set every time you want to add or remove elements. This is in Java9.
It can be a bit awkward or infeasible however to achieve this method for legacy code. So instead what you could do is have 2 Sets 1 which has the get method on it which is volatile, and then this is assigned a new instance in the refresh method.
public class Test {
volatile Set<Integer> cache = new HashSet<>();
public boolean contain(int num) {
return cache.contains(num);
}
public void refresh() {
Set<Integer> privateCache = new HashSet<>();
privateCache.addAll(getNums());
cache = privateCache;
}
}
Edit We don't want or need a ConcurrentHashSet, that is if you want to add and remove elements to a collection at the same time, which in my opinion is a pretty useless thing to do. But you want to switch the old Set with a new one, which is why you just need a volatile variable to make sure you can't read and edit the cache at the same time.
But as I mentioned in my answer at the start is that if you never modify collections, but instead make new ones each time you want to update a collection (note that this is a very cheap operation as internally the old set is reused in the operation). This way you never need to worry about concurrency, as there is no shared state between threads.

How would you make sure your cache doesn't contain invalid entries when calling contains?? Furthermore, you'd need to call refresh every time getNums() changes, which is pretty inefficient. It would be best if you make sure you control your changes to getNums() and then update cache accordingly. The cache might look like:
public class MyCache {
final ConcurrentHashMap<Integer, Boolean> cache = new ConcurrentHashMap<>(); //it's a ConcurrentHashMap to be able to use putIfAbsent
public boolean contains(Integer num) {
return cache.contains(num);
}
public void add(Integer nums) {
cache.putIfAbsent(num, true);
}
public clear(){
cache.clear();
}
public remove(Integer num) {
cache.remove(num);
}
}

Update
As #schmosel made me realize, mine was a wasted effort: it is in fact enough to initialize a complete new HashSet<> with your values in the refresh method. Assuming of course that the cache is marked with volatile. In short #Snickers3192's answer, points out what you seek.
Old answer
You can also use a slightly different system.
Keep two Set<Integer>, one of which will always be empty. When you refresh the cache, you can asynchronously re-initialize the second one and then just switch the pointers. Other threads accessing the cache won't see any particular overhead in this.
From an external point of view, they will always be accessing the same cache.
private volatile int currentCache; // 0 or 1
private final Set<Integer> caches[] = new HashSet[2]; // use two caches; either one will always be empty, so not much memory consumed
private volatile Set<Integer> cachePointer = null; // just a pointer to the current cache, must be volatile
// initialize
{
this.caches[0] = new HashSet<>(0);
this.caches[1] = new HashSet<>(0);
this.currentCache = 0;
this.cachePointer = caches[this.currentCache]; // point to cache one from the beginning
}
Your refresh method may look like this:
public void refresh() {
// store current cache pointer
final int previousCache = this.currentCache;
final int nextCache = getNextPointer();
// you can easily compute it asynchronously
// in the meantime, external threads will still access the normal cache
CompletableFuture.runAsync( () -> {
// fill the unused cache
caches[nextCache].addAll(getNums());
// then switch the pointer to the just-filled cache
// from this point on, threads are accessing the new cache
switchCachePointer();
// empty the other cache still on the async thread
caches[previousCache].clear();
});
}
where the utility methods are:
public boolean contains(final int num) {
return this.cachePointer.contains(num);
}
private int getNextPointer() {
return ( this.currentCache + 1 ) % this.caches.length;
}
private void switchCachePointer() {
// make cachePointer point to a new cache
this.currentCache = this.getNextPointer();
this.cachePointer = caches[this.currentCache];
}

Populating map from multiple threads

I have a ConcurrentHashMap which I am populating from multiple threads as shown below:
private static Map<ErrorData, Long> holder = new ConcurrentHashMap<ErrorData, Long>();
public static void addError(ErrorData error) {
if (holder.keySet().contains(error)) {
holder.put(error, holder.get(error) + 1);
} else {
holder.put(error, 1L);
}
}
Is there any possibility of race condition in above code and it can skip updates? Also how can I use Guava AtomicLongMap here if that can give better performance?
I am on Java 7.

Yes, there is a possibility of a race because you are not checking contains and putting atomically.
You can use AtomicLongMap as follows, which does this check atomically:
private static final AtomicLongMap<ErrorData> holder = AtomicLongMap.create();
public static void addError(ErrorData error) {
holder.getAndIncrement(error);
}
As described in the javadoc:
[T]he typical mechanism for writing to this map is addAndGet(K, long), which adds a long to the value currently associated with K. If a key has not yet been associated with a value, its implicit value is zero.
and
All operations are atomic unless otherwise noted.

If you are using java 8, you can take advantage of the new merge method:
holder.merge(error, 1L, Long::sum);

A 'vanilla' java 5+ solution :
public static void addError(final ErrorData errorData) {
Long previous = holder.putIfAbsent(errorData, 1L);
// if the error data is already mapped to some value
if (previous != null) {
// try to replace the existing value till no update takes place in the meantime
while (!map.replace(errorData, previous, previous + 1)) {
previous = map.get(errorData);
}
}
}

In Java 7 or older versions you need to use a compare-and-update loop:
Long prevValue;
boolean done;
do {
prevValue = holder.get(error);
if (prevValue == null) {
done = holder.putIfAbsent(error, 1L);
} else {
done = holder.replace(error, prevValue, newValue);
}
} while (!done);
With this code, if two threads race one may end up retrying its update, but they'll get the right value in the end.
Consider:
Thread1: holder.get(error) returns 1
Thread2: holder.get(error) returns 1
Thread1: holder.put(error, 1+1);
Thread2: holder.put(error, 1+1);
To fix this you need to use atomic operations to update the map.

How should I implement Guava cache when I plan to cache multiple values efficiently?

I have a Java class that has a Guava LoadingCache<String, Integer> and in that cache, I'm planning to store two things: the average time active employees have worked for the day and their efficiency. I am caching these values because it would be expensive to compute every time a request comes in. Also, the contents of the cache will be refreshed (refreshAfterWrite) every minute.
I was thinking of using a CacheLoader for this situation, however, its load method only loads one value per key. In my CacheLoader, I was planning to do something like:
private Service service = new Service();
public Integer load(String key) throws Exception {
if (key.equals("employeeAvg"))
return calculateEmployeeAvg(service.getAllEmployees());
if (key.equals("employeeEff"))
return calculateEmployeeEff(service.getAllEmployees());
return -1;
}
For me, I find this very inefficient since in order to load both values, I have to invoke service.getAllEmployees() twice because, correct me if I'm wrong, CacheLoader's should be stateless.
Which made me think to use the LoadingCache.put(key, value) method so I can just create a utility method that invokes service.getAllEmployees() once and calculate the values on the fly. However, if I do use LoadingCache.put(), I won't have the refreshAfterWrite feature since it's dependent on a cache loader.
How do I make this more efficient?

It seems like your problem stems from using strings to represent value types (Effective Java Item 50). Instead, consider defining a proper value type that stores this data, and use a memoizing Supplier to avoid recomputing them.
public static class EmployeeStatistics {
private final int average;
private final int efficiency;
// constructor, getters and setters
}
Supplier<EmployeeStatistics> statistics = Suppliers.memoize(
new Supplier<EmployeeStatistics>() {
#Override
public EmployeeStatistics get() {
List<Employee> employees = new Service().getAllEmployees();
return new EmployeeStatistics(
calculateEmployeeAvg(employees),
calculateEmployeeEff(employees));
}});
You could even move these calculation methods inside EmployeeStatistics and simply pass in all employees to the constructor and let it compute the appropriate data.
If you need to configure your caching behavior more than Suppliers.memoize() or Suppliers.memoizeWithExpiration() can provide, consider this similar pattern, which hides the fact that you're using a Cache inside a Supplier:
Supplier<EmployeeStatistics> statistics = new Supplier<EmployeeStatistics>() {
private final Object key = new Object();
private final LoadingCache<Object, EmployeeStatistics> cache =
CacheBuilder.newBuilder()
// configure your builder
.build(
new CacheLoader<Object, EmployeeStatistics>() {
public EmployeeStatistics load(Object key) {
// same behavior as the Supplier above
}});
#Override
public EmployeeStatistics get() {
return cache.get(key);
}};

However, if I do use LoadingCache.put(), I won't have the refreshAfterWrite feature since it's dependent on a cache loader.
I'm not sure, but you might be able to call it from inside the load method. I mean, compute the requested value as you do and put in the other. However, this feels hacky.
If service.getAllEmployees is expensive, then you could cache it. If both calculateEmployeeAvg and calculateEmployeeEff are cheap, then recompute them when needed. Otherwise, it looks like you could use two caches.
I guess, a method computing both values at once could be a reasonable solution. Create a tiny Pair-like class aggregating them and use it as the cache value. There'll be a single key only.
Concerning your own solution, it could be as trivial as
class EmployeeStatsCache {
private long validUntil;
private List<Employee> employeeList;
private Integer employeeAvg;
private Integer employeeEff;
private boolean isValid() {
return System.currentTimeMillis() <= validUntil;
}
private synchronized List<Employee> getEmployeeList() {
if (!isValid || employeeList==null) {
employeeList = service.getAllEmployees();
validUntil = System.currentTimeMillis() + VALIDITY_MILLIS;
}
return employeeList;
}
public synchronized int getEmployeeAvg() {
if (!isValid || employeeAvg==null) {
employeeAvg = calculateEmployeeAvg(getEmployeeList());
}
return employeeAvg;
}
public synchronized int getEmployeeEff() {
if (!isValid || employeeAvg==null) {
employeeAvg = calculateEmployeeEff(getEmployeeList());
}
return employeeAvg;
}
}
Instead of synchronized methods you may want to synchronize on a private final field. There are other possibilities (e.g. Atomic*), but the basic design is probably simpler than adapting Guava's Cache.
Now, I see that there's Suppliers#memoizeWithExpiration in Guava. That's probably even simpler.

Find messages from certain key till certain key while being able to remove stale keys

My problem
Let's say I want to hold my messages in some sort of datastructure for longpolling application:
1. "dude"
2. "where"
3. "is"
4. "my"
5. "car"
Asking for messages from index[4,5] should return:
"my","car".
Next let's assume that after a while I would like to purge old messages because they aren't useful anymore and I want to save memory. Let's say after time x messages[1-3] became stale. I assume that it would be most efficient to just do the deletion once every x seconds. Next my datastructure should contain:
4. "my"
5. "car"
My solution?
I was thinking of using a concurrentskiplistset or concurrentskiplist map. Also I was thinking of deleting the old messages from inside a newSingleThreadScheduledExecutor. I would like to know how you would implement(efficiently/thread-safe) this or maybe use a library?

The big concern, as I gather it, is how to let certain elements expire after a period. I had a similar requirement and I created a message class that implemented the Delayed Interface. This class held everything I needed for a message and (through the Delayed interface) told me when it has expired.
I used instances of this object within a concurrent collection, you could use a ConcurrentMap because it will allow you to key those objects with an integer key.
I reaped the collection once every so often, removing items whose delay has passed. We test for expiration by using the getDelay method of the Delayed interface:
message.getDelay(TimeUnit.MILLISECONDS);
I used a normal thread that would sleep for a period then reap the expired items. In my requirements it wasn't important that the items be removed as soon as their delay had expired. It seems that you have a similar flexibility.
If you needed to remove items as soon as their delay expired, then instead of sleeping a set period in your reaping thread, you would sleep for the delay of the message that will expire first.
Here's my delayed message class:
class DelayedMessage implements Delayed {
long endOfDelay;
Date requestTime;
String message;
public DelayedMessage(String m, int delay) {
requestTime = new Date();
endOfDelay = System.currentTimeMillis()
+ delay;
this.message = m;
}
public long getDelay(TimeUnit unit) {
long delay = unit.convert(
endOfDelay - System.currentTimeMillis(),
TimeUnit.MILLISECONDS);
return delay;
}
public int compareTo(Delayed o) {
DelayedMessage that = (DelayedMessage) o;
if (this.endOfDelay < that.endOfDelay) {
return -1;
}
if (this.endOfDelay > that.endOfDelay) {
return 1;
}
return this.requestTime.compareTo(that.requestTime);
}
#Override
public String toString() {
return message;
}
}

I'm not sure if this is what you want, but it looks like you need a NavigableMap<K,V> to me.
import java.util.*;
public class NaviMap {
public static void main(String[] args) {
NavigableMap<Integer,String> nmap = new TreeMap<Integer,String>();
nmap.put(1, "dude");
nmap.put(2, "where");
nmap.put(3, "is");
nmap.put(4, "my");
nmap.put(5, "car");
System.out.println(nmap);
// prints "{1=dude, 2=where, 3=is, 4=my, 5=car}"
System.out.println(nmap.subMap(4, true, 5, true).values());
// prints "[my, car]" ^inclusive^
nmap.subMap(1, true, 3, true).clear();
System.out.println(nmap);
// prints "{4=my, 5=car}"
// wrap into synchronized SortedMap
SortedMap<Integer,String> ssmap =Collections.synchronizedSortedMap(nmap);
System.out.println(ssmap.subMap(4, 5));
// prints "{4=my}" ^exclusive upper bound!
System.out.println(ssmap.subMap(4, 5+1));
// prints "{4=my, 5=car}" ^ugly but "works"
}
}
Now, unfortunately there's no easy way to get a synchronized version of a NavigableMap<K,V>, but a SortedMap does have a subMap, but only one overload where the upper bound is strictly exclusive.
API links
SortedMap.subMap
NavigableMap.subMap
Collections.synchronizedSortedMap

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

ConcurrentSkipListSet and replace remove(key) - java

set.remove will only return true for the thread that actually caused the object to be removed. The idea behind the set's concurrency is that multiple threads can be updating multiple objects. However, each individual object can only be updated by one thread at a time.

Related

ConcurrentHashMap removal of keys

Refreshing cache without impacting latency to access the cache

Populating map from multiple threads

How should I implement Guava cache when I plan to cache multiple values efficiently?

Find messages from certain key till certain key while being able to remove stale keys

Categories

Resources