Which Java collection should I use to implement a thread-safe cache?

Which Java collection should I use to implement a thread-safe cache? - java

I'm looking to implement a simple cache without doing too much work (naturally). It seems to me that one of the standard Java collections ought to suffice, with a little extra work. Specifically, I'm storing responses from a server, and the keys can either be the request URL string or a hash code generated from the URL.
I originally thought I'd be able to use a WeakHashMap, but it looks like that method forces me to manage which objects I want to keep around, and any objects I don't manage with strong references are immediately swept away. Should I try out a ConcurrentHashMap of SoftReference values instead? Or will those be cleaned up pretty aggressively too?
I'm now looking at the LinkedHashMap class. With some modifications it looks promising for an MRU cache. Any other suggestions?
Whichever collection I use, should I attempt to manually prune the LRU values, or can I trust the VM to bias against reclaiming recently accessed objects?
FYI, I'm developing on Android so I'd prefer not to import any third-party libraries. I'm dealing with a very small heap (16 to 24 MB) so the VM is probably pretty eager to reclaim resources. I assume the GC will be aggressive.

If you use SoftReference-based keys, the VM will bias (strongly) against recently accessed objects. However it would be quite difficult to determine the caching semantics - the only guarantee that a SoftReference gives you (over a WeakReference) is that it will be cleared before an OutOfMemoryError is thrown. It would be perfectly legal for a JVM implementation to treat them identically to WeakReferences, at which point you might end up with a cache that doesn't cache anything.
I don't know how things work on Android, but with Sun's recent JVMs one can tweak the SoftReference behaviour with the -XX:SoftRefLRUPolicyMSPerMB command-line option, which determines the number of milliseconds that a softly-reachable object will be retained for, per MB of free memory in the heap. As you can see, this is going to be exceptionally difficult to get any predictable lifespan behaviour out of, with the added pain that this setting is global for all soft references in the VM and can't be tweaked separately for individual classes' use of SoftReferences (chances are each use will want different parameters).
The simplest way to make an LRU cache is by extending LinkedHashMap as described here. Since you need thread-safety, the simplest way to extend this initially is to just use Collections.synchronizedMap on an instance of this custom class to ensure safe concurrent behaviour.
Beware premature optimisation - unless you need very high throughput, the theoretically suboptimal overhead of the coarse synchronization is not likely to be an issue. And the good news - if profiling shows that you are performing too slowly due to heavy lock contention, you'll have enough information available about the runtime use of your cache that you'll be able to come up with a suitable lockless alternative (probably based on ConcurrentHashMap with some manual LRU treatment) rather than having to guess at its load profile.

LinkedHashMap is easy to use for cache. This creates an MRU cache of size 10.
private LinkedHashMap<File, ImageIcon> cache = new LinkedHashMap<File, ImageIcon>(10, 0.7f, true) {
#Override
protected boolean removeEldestEntry(Map.Entry<File, ImageIcon> eldest) {
return size() > 10;
}
};
I guess you can make a class with synchronized delegates to this LinkedHashMap. Forgive me if my understanding of synchronization is wrong.

www.javolution.org has some interestig features - synchronized fast collections.
In your case it worth a try as it offers also some nifty enhancements for small devices as Android ones.

For synchronization, the Collections framework provides a synchronized map:
Map<V,T> myMap = Collections.synchronizedMap(new HashMap<V, T>());
You could then wrap this, or handle the LRU logic in a cache object.

I like Apache Commons Collections LRUMap

Related

Do Orphaned object in java lead to performance Issues [duplicate]

Should Java Objects be reused as often as it can be reused ? Or should we reuse it only when they are "heavyweight", ie have OS resources associated with it ?
All old articles on the internet talk about object reuse and object pooling as much as possible, but I have read recent articles that say new Object() is highly optimized now ( 10 instructions ) and Object reuse is not as big a deal as it used to be.
What is the current best practice and how are you people doing it ?

I let the garbage collector do that kind of deciding for me, the only time I've hit heap limit with freshly allocated objects was after running a buggy recursive algorithm for a couple of seconds which generated 3 * 27 * 27... new objects as fast as it could.
Do what's best for readability and encapsulation. Sometimes reusing objects may be useful, but generally you shouldn't worry about it.

If you use them very intensively and the construction is costly, you should try to reuse them as much as you can.
If your objects are very small, and cheap to create ( like Object ) you should create new ones.
For instance connections database are pooled because the cost of creating a new one is higher than those of creating .. mmhh new Integer for instance.
So the answer to your question is, reuse when they are heavy AND are used often ( it is not worth to pool a 3 mb object that is only used twice )
Edit:
Additionally, this item from Effective Java:Favor Immutability is worth reading and may apply to your situation.

Object creation is cheap, yes, but sometimes not cheap enough.
If you create a lot (and I mean A LOT) temporary objects in rapid succession, the costs for the garbage collector are considerable. However even with a good profiler you may not necessarily see the costs easily, as the garbage collector nowadays works in short intervals instead of blocking the whole application for a second or two.
Most of the performance improvements I got in my projects came from either avoiding object creation or avoiding the whole work (including the object creation) through aggressive caching. No matter how big or small the object is, it still takes time to create it and to manage the references and heap structures for it. (And of course, the cleanup and the internal heap-defrag/copying also takes time.)
I would not start to be religious about avoiding object creation at all cost, but if you see a jigsaw pattern in your memory-profiler, it means your garbage collector is on heavy duty. And if your garbage collector uses the CPU, the CPI is not available for your application.
Regarding object pooling: Doing it right and not running into either memory leaks or invalid states or spending more time on the management than you would save is difficult. So I never used that strategy.
My strategy has been to simply strive for immutable objects. Immutable things can be cached easily and therefore help to keep the system simple.
However, no matter what you do: Make sure you check your hotspots with a profiler first. Premature optimization is the root of most evilness.

Let the garbage collector do its job, it can be considered better than your code.
Unless a profiler proves it guilty. And don't even use common sense to try to figure out when it's wrong. In unusual cases even cheap objects like byte arrays are better pooled.
Rule 1 of optimization: don't do it.
Rule 2 (for experts only): don't do it yet.

The rule of thumb should be to use your common sense and reuse objects when their creation consumes significant resources such as I/O, network traffic, DB connections, etc...
If it's just creating a new String(), forget about the reuse, you'll gain nothing from it. Code readability has higher preference.

I would worry about performance issues if they arise. Do what makes sense first (would you do this with primatives), if you then run a profiling tool and find that it is new causing you problems, start to think about pre-allocation (ie. when your program isn't doing much work).
Re-using objects sounds like a disaster waiting to happen by the way:
SomeClass someObject = new SomeClass();
someObject.doSomething();
someObject.changeState();
someObject.changeOtherState();
someObject.sendSignal();
// stuff
//re-use
someObject.reset(); // urgh, had to put this in to support reuse
someObject.doSomethingElse(); // oh oh, this is wrong after calling changeOtherState, regardless of reset
someObject.changeState(); // crap, now this is wrong but it's not obvious yet
someObject.doImportantStuff(); // what's going on?

Object creation is certainly faster than it used to be. The newer generational GC in JDKs 5 and higher are improvements, too.
I don't think either of these makes excessive creation of objects cost-free, but they do reduce the importance of object pooling. I think pooling makes sense for database connections, but I don't attempt it for my own domain objects.
Reuse puts a premium on thread-safety. You need to think carefully to ensure that you can reuse objects safely.
If I decided that object reuse was important I'd do it with products like Terracotta, Tangersol, GridGain, etc. and make sure that my server had scads of memory available to it.

Second the above comments.
Don't try and second guess the GC and Hotspot. Object pooling may have been useful once but these days its not so useful unless you are talking about database connections or unique system resources.
Just try and write clean and simple code and be amazed at what Hotspot can do.
Why not use VisualVM or a profiler to take a look at your code?

When is it a good time to release cached objects in Java?

I am developing a Java desktop application where I have many caches, such as object pools, cached JPanels ...etc.
Example:
When a user switches from one Panel to another, I don't destroy the previous one in case the user switches back.
But, The application memory consumption might get high while the system is in desperate need of these memory resources that I am consuming not so justifiably...
In an iOS application, I would release these in "applicationDidReceiveMemoryWarning" method. But in Java... ?
So, When is it a good time to release cached objects in Java?

Caching often isn't a good idea in Java - creating new objects is usually much cheaper than you think, and often results in better performance than keeping objects cached "just in case". Having a lot of long-lived objects around is bad for GC performance and also can put considerable pressure on processor caches.
JPanels for example - are sufficiently lightweight that it's fine to create a completely new one whenever you need it.
If I were you, I'd cache as little as possible, and only do so when you've proved a substantial performance benefit from doing so.
Even if you do need to cache, consider using a cache that uses Soft References - this way the JVM will be able to clear items from the cache automatically if it needs to free up memory. This is simpler and safer than trying to roll your own caching strategy. You could use an existing Soft Reference cache implementation like Guava's CacheBuilder (thanks AlistairIsrael!).

how to use concurrenthashmap correctly?

say, I have a lot of read operations and a few write operations, and the object which will be placed in map is quite "heavy"-initialization of such object costs much memory/time,etc.
how should I code to both utilize the high performance of concurrenthashmap and ensure a minimum cost of unnecessary initialization of those cached objects.
sample code snippet is welcome and greatly appreciate!
Thanks!

Pretty sure guava has exactly what you are looking for, see MapMaker.makeComputingMap.

The code in ConcurrentHashMap is highly optimized - I would just use it.
The concurrency overhead is minimal if there are few updates. If a write operation occurs during a read, there is some overhead when a temporary copy of the internal state is made, but otherwise the difference in performance in negligible. I would use the provided class as is and only if you find you are getting performance problems, then look into using something else.
Note that the initialization cost is not relevant to concurrency performance, only the operation of adding it to the map is.

It depends on your requirements, but you may consider using a Pool of instances to reduce the instantiation count. That will improve your performance if you are currently dumping items from the map to garbage collect them, so instead of GC, you place them back in the pool and re-use them later.

java cache system and static HashMap storage - Performance

What are the prons and cons between using "static Hashmap store object" and apache java cache system for web based enterprise apps? Which one is best for performance and reduce heap memory problem
ex:
Map store=ApplicationCtx.getApplicationParameterMap();
UserObj obj=store.get('user');
VS
UserObj obj = JCS.getInstance("user");

If you want the best of both worlds, you might consider using a synchronized LinkedHashMap which can be used as an LRU cache (a basic eviction policy) and is thread safe.
A HashMap is simpler and faster, it doesn't do very much and is not thread safe. (Which is why it is the fastest)
Most Caching systems are more sophisticated which means there will be a small overhead. e.g. they are thread safe (which I am sure you will need) and support eviction policies (which you also indicate you will need)
In short, the first one is fastest but probably won't do what you need. The second will help you manage how much memory you use.

How to implement ConcurrentHashMap with features similar in LinkedHashMap?

I have used LinkedHashMap with accessOrder true along with allowing a maximum of 500 entries at any time as the LRU cache for data. But due to scalability issues I want to move on to some thread-safe alternative. ConcurrentHashMap seems good in that regard, but lacks the features of accessOrder and removeEldestEntry(Map.Entry e) found in LinkedHashMap. Can anyone point to some link or help me to ease the implementation.

I did something similar recently with ConcurrentHashMap<String,CacheEntry>, where CacheEntry wraps the actual item and adds cache eviction statistics: expiration time, insertion time (for FIFO/LIFO eviction), last used time (for LRU/MRU eviction), number of hits (for LFU/MFU eviction), etc. The actual eviction is synchronized and creates an ArrayList<CacheEntry> and does a Collections.sort() on it using the appropriate Comparator for the eviction strategy. Since this is expensive, each eviction then lops off the bottom 5% of the CacheEntries. I'm sure performance tuning would help though.
In your case, since you're doing FIFO, you could keep a separate ConcurrentLinkedQueue. When you add an object to the ConcurrentHashMap, do a ConcurrentLinkedQueue.add() of that object. When you want to evict an entry, do a ConcurrentLinkedQueue.poll() to remove the oldest object, then remove it from the ConcurrentHashMap as well.
Update: Other possibilities in this area include a Java Collections synchronization wrapper and the Java 1.6 ConcurrentSkipListMap.

Have you tried using one of the many caching solutions like ehcache?
You could try using LinkedHashMap with a ReadWriteLock. This would give you concurrent read access.

This might seem old now, but at least just for my own history tracking, I'm going to add my solution here: I combined ConcurrentHashMap that maps K->subclass of WeakReference, ConcurrentLinkedQueue, and an interface that defines deserialization of the value objects based on K to run LRU caching correctly. The queue holds strong refs, and the GC will evict the values from memory when appropriate. Tracking the queue size involved AtomicInteger, as you can't really inspect the queue to determine when to evict. The cache will handle eviction from/adding to the queue, as well as map management. If the GC evicted the value from memory, the implementation of the deserialization interface will handle retrieving the value back. I also had another implementation that involved spooling to disk/re-reading what was spooled, but that was a lot slower than the solution I posted here, as Ihad to synchronize spooling/reading.

You mention wanting to solve scalability problems with a "thread-safe" alternative. "Thread safety" here means that the structure is tolerant of attempts at concurrent access, in that it won't suffer corruption by concurrent use without external synchronization. However, such tolerance does not necessarily help to improve "scalability". In the simplest -- though usually misguided -- approach, you'll try to synchronize your structure internally and still leave non-atomic check-then-act operations unsafe.
LRU caches require at least some awareness of the total structure. They need something like a count of the members or the size of the members to decide when to evict, and then they need to be able to coordinate the eviction with concurrent attempts to read, add, or remove elements. Trying to reduce the synchronization necessary for concurrent access to the "main" structure fights against your eviction mechanism, and forces your eviction policy to be less precise in its guarantees.
The currently accepted answer mentions "when you want to evict an entry". Therein lies the rub. How do you know when you want to evict an entry? Which other operations do you need to pause in order to make this decision?

The moment you use another data structure along with concurrenthashmap, the atomicity of the operations sucha adding a new item in concurrenthashmap and adding in other data structure cant be guaranteed without additional synchronization such as ReadWriteLock which will degrade performance

Wrap the map in a Collections.synchronizedMap(). If you need to call additional methods, then synchronize on the map that you got back from this call, and invoke the original method on the original map (see the javadocs for an example). The same applies when you iterate over the keys, etc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.