Best data structure in Java when using HashSet as a cache

Best data structure in Java when using HashSet as a cache - java

My use case is this:
I need to cache a set of strings for frequent read access. The cache is updated by a daemon thread periodically. Moreover, the cache element will never get updated individually, it would always be set.clear();set.addAll(List)I'm currently using a HashSet protected by an ReentrantReadWriteLock. Is there a better way to do this?

One option would be a volatile set:
private volatile Set<String> set = new HashSet<> ();
public void update() {
Set<String> newSet = getNewData();
set = newSet;
}
This is thread safe (if you don't let other code access the set itself) and does not require locking. One drawback is that you hold both sets in memory until the next GC (not sure how much space is used per entry - to be tested).

Related

Java visibility: final static non-threadsafe collection changes after construction

I found the following code snippet in luaj and I started to doubt that if there is a possibility that changes made to the Map after it has been constructed might not be visible to other threads since there is no synchronization in place.
I know that since the Map is declared final, its initialized values after construction is visible to other threads, but what about changes that happen after that.
Some might also realize that this class is so not thread-safe that calling coerce in a multi-threaded environment might even cause infinite loop in the HashMap, but my question is not about that.
public class CoerceJavaToLua {
static final Map COERCIONS = new HashMap(); // this map is visible to all threads after construction, since its final
public static LuaValue coerce(Object paramObject) {
...;
if (localCoercion == null) {
localCoercion = ...;
COERCIONS.put(localClass, localCoercion); // visible?
}
return ...;
}
...
}

You're correct that changes to the Map may not be visible to other threads. Every method that accesses COERCIONS (both reading and writing) should be synchronized on the same object. Alternatively, if you never need sequences of accesses to be atomic, you could use a synchronized collection.
(BTW, why are you using raw types?)

This code is actually bad and may cause many problems (probably not infinite loop, that's more common with TreeMap, with HashMap it's more likely to get the silent data loss due to overwrite or probably some random exception). And you're right, it's not guaranteed that the changes made in one thread will be visible by another one.
Here the problem may look not very big as this Map is used for caching purposes, thus silent overwrites or visibility lag doesn't lead to real problems (just two distinct instances of coersion will be used for the same class, which is probably ok in this case). However it's still possible that such code will break your program. If you like, you can submit a patch to LuaJ team.

Two options:
// Synchronized (since Java 1.2)
static final Map COERCIONS = Collections.synchronizedMap(new HashMap());
// Concurrent (since Java 5)
static final Map COERCIONS = new ConcurrentHashMap();
They each have their pros and cons.
ConcurrentHashMap pro is no locking. Con is that operations are not atomic, e.g. an Iterator in one thread and a call to putAll in another will allow iterator to see some of the values added.

Is this code multi-thread safe?

private static Map<Integer, String> map = null;
public static String getString(int parameter){
if(map == null){
map = new HashMap<Integer, String>();
//map gets filled here...
}
return map.get(parameter);
}
Is that code unsafe as multithreading goes?

As mentioned, it's definitely not safe. If the contents of the map are not based on the parameter in getString(), then you would be better served by initializing the map as a static initializer as follows:
private static final Map<Integer, String> MAP = new HashMap<Integer,String>();
static {
// Populate map here
}
The above code gets called once, when the class is loaded. It's completely thread safe (although future modification to the map are not).
Are you trying to lazy load it for performance reasons? If so, this is much safer:
private static Map<Integer, String> map = null;
public synchronized static String getString(int parameter){
if(map == null){
map = new HashMap<Integer, String>();
//map gets filled here...
}
return map.get(parameter);
}
Using the synchronized keyword will make sure that only a single thread can execute the method at any one time, and that changes to the map reference are always propagated.
If you're asking this question, I recommend reading "Java Concurrency in Practice".

Race condition? Possibly.
If map is null, and two threads check if (map == null) at the same time, each would allocate a separate map. This may or may not be a problem, depending mainly on whether map is invariant. Even if the map is invariant, the cost of populating the map may also become an issue.
Memory leak? No.
The garbage collector will do its job correctly regardless of the race condition.

You do run the risk of initializing map twice in a multi-threaded scenario.
In a managed language, the garbage collector will eventually dispose of the no-longer-referenced instance. In an unmanaged language, you will never free the memory allocated for the overwritten map.
Either way, initialization should be properly protected so that multiple threads do not run initialization code at the same time.
One reason: The first thread could be in the middle of initializing the HashMap, while a second thread comes a long, sees that map is not null, and merrily tries to use the partially-initialized data structure.

It is unsafe in multithreading case due to race condition.
But do you really need the lazy initialization for the map? If the map is going to be used anyway, seems you could just do eager initialization for it..

The above code isn't thread-safe, as others have mentioned, your map can be initialized twice. You may be tempted to try and fix the above code by adding some synchronization, this is known as "double checked locking", Here is an article that describes the problems with this approach, as well as some potential fixes.
The simplest solution is to make the field a static field in a separate class:
class HelperSingleton {
static Helper singleton = new Helper();
}
it can also be fixed using the volatile keyword, as described in Bill Pugh's article.

No, this code is not safe for use by multiple threads.
There is a race condition in the initialization of the map. For example, multiple threads could initialize the map simultaneously and clobber each others' writes.
There are no memory barriers to ensure that modifications made by a thread are visible to other threads. For example, each thread could use its own copy of the map because they never "see" the values written by another thread.
There is no atomicity to ensure that invariants are preserved as the map is accessed concurrently. For example, a thread that's performing a get() operation could get into an infinite loop because another thread rehashed the buckets during a simultaneous put() operation.

If you are using Java 6, use ConcurrentHashMap
ConcurrentHashMap JavaDoc

ConcurrentHashMap vs ReentrantReadWriteLock based Custom Map for Reloading

Java Gurus,
Currently we have a HashMap<String,SomeApplicationObject> which is being read frequently and modified occasionally and we are having issues that during the modification/reloading, Read operation returns null which is not acceptable.
To fix this I have following options:
A. Use ConcurrentHashMap
Which looks like the first choice but the operation which we are talking about is reload() - means clear() followed by replaceAll(). So if the Map is read post clear() and pre replaceAll() it returns null which is not desirable. Even if I synchronize this doesn't resolves the issue.
B. Create another implementation based upon ReentrantReadWriteLock
Where I would create acquire Write Lock before reload() operation. This seems more appropriate but I feel there must be something already available for this and I need not to reinvent the wheel.
What is the best way out?
EDIT Is any Collection already available with such feature?

Since you are reloading the map, I would replace it on a reload.
You can do this by using a volatile Map, which you replace in full when it is updated.

It seems you are not sure as to how what Peter Lawrey suggests can be implemented. It could look like this:
class YourClass {
private volatile Map<String, SomeApplicationObject> map;
//constructors etc.
public void reload() {
Map<String,SomeApplicationObject> newMap = getNewValues();
map = Collections.unmodifiableMap(newMap);
}
}
There are no concurrency issues because:
The new map is created via a local variable, which by definition is not shared - getNewValues does not need to be synchronized or atomic
The assignement to map is atomic
map is volatile, which guarantees that other threads will see the change

This sounds a lot like Guava's Cache, though it really depends how you're populating the map, and how you compute the values. (Disclosure: I contribute to Guava.)
The real question is whether or not you can specify how to compute your SomeApplicationObject given the input String. Just based on what you've told us so far, it might look something like this...
LoadingCache<String, SomeApplicationObject> cache = CacheBuilder.newBuilder()
.build(
new CacheLoader<String, SomeApplicationObject>() {
public SomeApplicationObject load(String key) throws AnyException {
return computeSomeApplicationObject(key);
}
});
Then, whenever you wanted to rebuild the cache, you just call cache.invalidateAll(). With a LoadingCache, you can then call cache.get(key) and if it hasn't computed the value already, it'll get recomputed. Or maybe after calling cache.invalidateAll(), you can call cache.loadAll(allKeys), though you'd still need to be able to load single elements at a time in case any queries come in between the invalidateAll and loadAll.
If this isn't acceptable -- if you can't load one value individually, you have to load them all at once -- then I'd go ahead with Peter Lawrey's approach -- keep a volatile reference to a map (ideally an ImmutableMap), recompute the whole map and assign the new map to the reference when you're done.

Concurrent Modification Exceptions

I am getting the following java.util.ConcurrentModificationException on this method
private AtomicReference<HashMap<String, Logger>> transactionLoggerMap = new AtomicReference<HashMap<String,Logger>>();
public void rolloutFile() {
// Get all the loggers and fire a temp log line.
Set<String> transactionLoggerSet = (Set<String>) transactionLoggerMap.get().keySet();
Iterator<String> transactionLoggerSetIter = transactionLoggerSet.iterator();
while(transactionLoggerSetIter.hasNext()){
String key = (String) transactionLoggerSetIter.next();
Logger txnLogger = transactionLoggerMap.get().get(key);
localLogger.trace("About to do timer task rollover:");
txnLogger.info(DataTransformerConstants.IGNORE_MESSAGE);
}
}
Please suggest, if I am using a Atomic Reference, how do i get a como?

A ConcurrentModificationException means that you have modified the collection outside of the iterator. I don't see any modifications in your loop so I assume that there is another thread that is also adding or removing from the transactionLoggerMap at the same time you are iterating across it.
Even though you have it wrapped in an AtomicReference, you still cannot have two threads making changes to the same unsynchronized collection at the same time. AtomicReference does not synchronize the object it is wrapping -- it just gives you a way to atomically get and set that reference.
You will need to make this a synchronized collection either by using the ConcurrentHashMap class or wrapping your HashMap using the Collections.synchronizedMap(map) method.

Because you are not protecting against concurrent modifications during your iteration. The Atomic reference only makes sure you get your Map (and it's contents).

Maybe you can consider iterating over a local copy of the collection instead of the same collection. It would be an easy way to ensure your collection will not be modified while you're looping over it. It is advisable to use immutable objects on a multi-threaded environment and you prevent this kind of problems for free.
Hope it helps.

Remove the redundant
Set<String> transactionLoggerSet = (Set<String>) transactionLoggerMap.get().keySet();
You would still have to use synchronization as you are iterating over the map. SynchronizedMap guarentees consistency for its API methods. For the rest, you would need to do client side synchronization

Volatile HashMap vs ConcurrentHashMap

I have a cache class which contains a volatile HashMap<T> to store cache items.
I'm curious what would be the consequences of changing volatile HashMap to ConcurrentHashMap?
Would i gain performance increase? This cache is readonly cache.
What would be the best option to use? just HashMap? Cache is being populated on a interval.

First, it appears you don't understand what the volatile keyword does. It makes sure that if the reference value held by the variable declared volatile changes, other threads will see it rather than having a cached copy. It has nothing to do with thread-safety in regard to accessing the HashMap
Given that, and the fact that you say the HashMap is read-only ... you certainly don't need to use anything that provides thread-safety including a ConcurrentHashMap
Edit to add: Your last edit you now say "The cache is being populated on a interval"
That's not read-only then, is it?
If you're going to have threads reading from it while you are writing (updating the existing HashMap) then you should use a ConcurrentHashMap, yes.
If you are populating an entirely new HashMap then assigning it to the existing variable, then you use volatile

You say the cache is read-only, but also being updated on an interval which seems contradictory.
If the whole cache gets updated on an interval, I'd keep using the volatile.
The volatile will make sure that the updated map is safely published.
public final class Cache
{
private volatile Map<?,?> cache;
private void mapUpdate() {
Map<?,?> newCache = new HashMap<>();
// populate the map
// update the reference with an immutable collection
cache = Collections.unmodifiableMap(newCache);
}
}
If the interval update is modifying the same cache, then you probably want to use a ConcurrentHashMap, or copy the map, update the copy, and update the reference.
public final class Cache
{
private volatile Map<?,?> cache;
private void mapUpdate() {
Map<?,?> newCache = new HashMap<>(cache);
// update the map
// update the reference with an immutable collection
cache = Collections.unmodifiableMap(newCache);
}
}

I have a similar use case for my web application. I am using a HAshMap for my in-memory cache. The use case is as follows -
One user request comes in and first checks the cache for existence of a record using an input key. This is done in the add method.
If the object is not present then it inserts the new record in the cache.
Similarly in the remove method first checks the presence of a record in the cache using the key and if found just removes that.
I want to make sure of two threads are concurrently executing one on add and another on remove method will this approach make sure they at point of them they see the latest data in the cache? If i am not wrong then synchronized method takes care of thread safety where as volatile takes care of visibility.
private volatile HashMap<String,String> activeRequests = new HashMap<String,String>();
public synchronized boolean add(String pageKey, String space, String pageName) {
if (!(activeRequests.get(pageKey) == null)) {
return false;
}
activeRequests.put(pageKey, space + ":" + pageName);
return true;
}
public synchronized void remove(String pageKey) {
if(!(activeRequests.get(pageKey) == null))
activeRequests.remove(pageKey);
}

AFAIK, although the first answer explains correctly, depending on use case, using a volatile on a cache that is refreshed and replaced frequently is unnecessary overhead and can actually be bad or inconsistent assuming this is just static metadata snapshot and not updated by other threads.
If you take an example of a Http Request that reads everything from the cache to get everything needed, the request uses a reference of the map, then starts reading some keys from the reference, then half way while reading, cache reference is updated to a new hashmap (refresh), now it starts reading a different state of cache and can become inconsistent if entries in cache are not for a specific time snapshot T. With volatile, you read Key1:Val1 at T1, Key2:Val2 at T2 wheras you need Val1, Val2 to be read for the same snapshot at T1. With volatile your reference is always updated, you could read Key1:Val1 first time and Key1:Val2 second time giving different data in the same request.
Without volatile, the request will use a reference always pointing to a reference snapshot until it has completed processing. Without volatile, you will always read Key1:Val1 at T1 and same value Key2:Val1 at T2. Once all requests using this reference have completed, the older dereferenced map will be GCed.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Best data structure in Java when using HashSet as a cache - java

Related

Java visibility: final static non-threadsafe collection changes after construction

Is this code multi-thread safe?

ConcurrentHashMap vs ReentrantReadWriteLock based Custom Map for Reloading

Concurrent Modification Exceptions

Volatile HashMap vs ConcurrentHashMap

Categories

Resources