how to use concurrenthashmap correctly?

how to use concurrenthashmap correctly? - java

say, I have a lot of read operations and a few write operations, and the object which will be placed in map is quite "heavy"-initialization of such object costs much memory/time,etc.
how should I code to both utilize the high performance of concurrenthashmap and ensure a minimum cost of unnecessary initialization of those cached objects.
sample code snippet is welcome and greatly appreciate!
Thanks!

Pretty sure guava has exactly what you are looking for, see MapMaker.makeComputingMap.

The code in ConcurrentHashMap is highly optimized - I would just use it.
The concurrency overhead is minimal if there are few updates. If a write operation occurs during a read, there is some overhead when a temporary copy of the internal state is made, but otherwise the difference in performance in negligible. I would use the provided class as is and only if you find you are getting performance problems, then look into using something else.
Note that the initialization cost is not relevant to concurrency performance, only the operation of adding it to the map is.

It depends on your requirements, but you may consider using a Pool of instances to reduce the instantiation count. That will improve your performance if you are currently dumping items from the map to garbage collect them, so instead of GC, you place them back in the pool and re-use them later.

Related

Java concurrent collection for write intensive application - only a few reads

I'm searching various alternatives for a write intensive application when it comes to selecting a Java data structure. I know that ONE data structure could not provide a single universal solution solution to a write intensive application but I'm surprised by the lack of discussion out there on the topic.
There are many people talking about read-intensive-rate-writes or concurrent-read-only applications but I cannot find any conversation around the data structures used for a write intensive application.
Based on the following requirements
key/value pairs - Map
Unsorted - for the sake of simplicity
1000+ writes per minute / negligible reads
All data are stored in memory
I am thinking of the following approaches
Simple ConcurrentHashMap: Although based on this from official Oracle docs
[...] even though all operations are thread-safe, retrieval operations do not entail locking
It must be better suited for read intensive applications
Combination of a BlockingQueue and a set of ConcurrentHashMaps. In batches, the queue is drained of all its elements and then the updates are appropriately allocated in the underlying maps. In this approach though I would need an additional map to identify which maps are included to every map - acting like an orchestrator
Use a HashMap and synchronize on the API level. Meaning that every write related method is going to be synchronized
synchronized void aWriteMethod(Integer aKey,String aValue) {
thisWriteIntensiveMap.put(aKey,aValue);
}
It'd be great if this question did not just receive criticism on the aforementioned options but also suggestions about new and better solutions.
PS: Apart from the integrity of the data, order of operations and throttling issues what else needs to be taken into account into choosing the "best" approach for a write intensive.
I know that this might look a bit open ended but it'd be interesting to hear how people think on this problem.

Even if you would go for the worst possible type of map like e.g. a synchronized HashMap, I don't think you will notice any performance impact with your load. A few thousand writes per minute is nothing.
I would set up a JMH benchmark and try out various map implementations. The obvious candidate is a ConcurrentHashMap because it is designed to deal with concurrent access.
So I think this is a good example of premature optimization.

List of performance improvement features that we can implement in java

May be this is a well known question, But i didn't find the best reference for this ques...
what is the formula to calculate and assign the default u-limit, verbose (for gc) and max heap memory value?
If there is no specific formula, what is the criteria to specify this for a particular machine.
If possible could anyone please explain these concepts also.
Is there any other concepts we need to consider for performance improvement?
How to tune the JVM for better performance,

Stop what you're doing right now.
Tuning the JVM is probably the last thing you should worry about. Until you've gone through every other performance trick in the book, the default settings should be just fine.
Firstly you need to profile your application and find out where the bottlenecks are. Specifically, you will want to know:
What functions /methods are consuming the majority of CPU time?
Where are all the memory allocations happening?
What kind of objects are taking up most space on the heap?
Then you should apply targeted optimisations to the areas that are causing problems. There are thousands of valid techniques, but here are the ones that I find are most useful:
Improve algorithms - anything that is taking up a decent chunk of CPU time and has complexity of O(n^2) or worse is probably a good candidate for improvement. Try to get it to O(n log n) or better.
Share immutable data - if you have a lot of copies of the same data then it makes sense to turn these into immutable objects and share a single instance. This can save a lot of memory (and has the nice effect of improving thread safety / concurrency)
Use primitive types - replace Integer with int etc. This saves memory and makes numerical operations faster.
Be lazy - don't compute things until they are definitely needed.
Cache things - if something is expensive to compute but frequently requested, store it in a cache after the first request. Use a cache backed by a SoftHashMap so that the memory can still be released if needed.
Offload work - Can you make use of multiple cores? Can the client application do some of the work for you?
After making any changes you then need to profile again. At the very least, you will want to confirm that your optimisations actually helped. Additionally, fixing one bottleneck will usually move the bottleneck to another part of the application. So you will need to identify the new place to focus next.
Repeat until your application is fast enough (as defined by your own or your customers' requirements).

Are there any drawbacks with ConcurrentHashMap?

I need a HashMap that is accessible from multiple threads.
There are two simple options, using a normal HashMap and synchronizing on it or using a ConcurrentHashMap.
Since ConcurrentHashMap does not block on read operations it seems much better suited for my needs (almost exclusively reads, almost never updates).
On the other hand, I expect very low concurrency anyway, so there should be no blocking (just the cost of managing the lock).
The Map will also be very small (under ten entries), if that makes a difference.
Compared to a regular HashMap, how much more costly are the read and write operations (I am assuming that they are)? Or is ConcurrentHashMap just always better when there might be even a moderate level of concurrent access, regardless of read/update ratio and size?

On the other hand, I expect very low concurrency anyway, so there should be no blocking (just the cost of managing the lock).
The cost of acquiring and releasing an uncontend Java mutex (primitive lock) is miniscule. So if you believe that the probability of contention is very low then a simple HashMap is probably your best bet.
But this is all conjecture. Unless and until you have actually profiled your application, all time spent on speculative optimization is most likely (*) time wasted.
* ... unless you have a really good intuition.

CHM pays some penalty for the use of Atomic* operations under the covers, when compared to HashMap. How much? Guess what... measure it in your app... ;-)
If you find that you actually have a performance problem, there's probably a very specialized solution for <10 entries that would be cheaper than any solution assembled out of java.util land, but I'd not jump to that until you know you have a performance issue.

In terms of throughput and performance, the overhead is usually negligible.
On a different note, the memory footprint of a ConcurrentHashMap (at the instance level) is somewhat larger than a HashMap. If you have a large number of small-sized CHMs, this overhead might add up.

The main issue w/ CHM: it doesn't scale well unless you change the c-tor call but mostly it doesn't automatically scale for the available cores.
3 links below to lock-free hashmap
http://www.azulsystems.com/blog/cliff-click/2007-03-26-non-blocking-hashtable
http://www.azulsystems.com/blog/cliff-click/2007-04-01-non-blocking-hashtable-part-2
http://sourceforge.net/projects/high-scale-lib/

Which Java collection should I use to implement a thread-safe cache?

I'm looking to implement a simple cache without doing too much work (naturally). It seems to me that one of the standard Java collections ought to suffice, with a little extra work. Specifically, I'm storing responses from a server, and the keys can either be the request URL string or a hash code generated from the URL.
I originally thought I'd be able to use a WeakHashMap, but it looks like that method forces me to manage which objects I want to keep around, and any objects I don't manage with strong references are immediately swept away. Should I try out a ConcurrentHashMap of SoftReference values instead? Or will those be cleaned up pretty aggressively too?
I'm now looking at the LinkedHashMap class. With some modifications it looks promising for an MRU cache. Any other suggestions?
Whichever collection I use, should I attempt to manually prune the LRU values, or can I trust the VM to bias against reclaiming recently accessed objects?
FYI, I'm developing on Android so I'd prefer not to import any third-party libraries. I'm dealing with a very small heap (16 to 24 MB) so the VM is probably pretty eager to reclaim resources. I assume the GC will be aggressive.

If you use SoftReference-based keys, the VM will bias (strongly) against recently accessed objects. However it would be quite difficult to determine the caching semantics - the only guarantee that a SoftReference gives you (over a WeakReference) is that it will be cleared before an OutOfMemoryError is thrown. It would be perfectly legal for a JVM implementation to treat them identically to WeakReferences, at which point you might end up with a cache that doesn't cache anything.
I don't know how things work on Android, but with Sun's recent JVMs one can tweak the SoftReference behaviour with the -XX:SoftRefLRUPolicyMSPerMB command-line option, which determines the number of milliseconds that a softly-reachable object will be retained for, per MB of free memory in the heap. As you can see, this is going to be exceptionally difficult to get any predictable lifespan behaviour out of, with the added pain that this setting is global for all soft references in the VM and can't be tweaked separately for individual classes' use of SoftReferences (chances are each use will want different parameters).
The simplest way to make an LRU cache is by extending LinkedHashMap as described here. Since you need thread-safety, the simplest way to extend this initially is to just use Collections.synchronizedMap on an instance of this custom class to ensure safe concurrent behaviour.
Beware premature optimisation - unless you need very high throughput, the theoretically suboptimal overhead of the coarse synchronization is not likely to be an issue. And the good news - if profiling shows that you are performing too slowly due to heavy lock contention, you'll have enough information available about the runtime use of your cache that you'll be able to come up with a suitable lockless alternative (probably based on ConcurrentHashMap with some manual LRU treatment) rather than having to guess at its load profile.

LinkedHashMap is easy to use for cache. This creates an MRU cache of size 10.
private LinkedHashMap<File, ImageIcon> cache = new LinkedHashMap<File, ImageIcon>(10, 0.7f, true) {
#Override
protected boolean removeEldestEntry(Map.Entry<File, ImageIcon> eldest) {
return size() > 10;
}
};
I guess you can make a class with synchronized delegates to this LinkedHashMap. Forgive me if my understanding of synchronization is wrong.

www.javolution.org has some interestig features - synchronized fast collections.
In your case it worth a try as it offers also some nifty enhancements for small devices as Android ones.

For synchronization, the Collections framework provides a synchronized map:
Map<V,T> myMap = Collections.synchronizedMap(new HashMap<V, T>());
You could then wrap this, or handle the LRU logic in a cache object.

I like Apache Commons Collections LRUMap

How to implement ConcurrentHashMap with features similar in LinkedHashMap?

I have used LinkedHashMap with accessOrder true along with allowing a maximum of 500 entries at any time as the LRU cache for data. But due to scalability issues I want to move on to some thread-safe alternative. ConcurrentHashMap seems good in that regard, but lacks the features of accessOrder and removeEldestEntry(Map.Entry e) found in LinkedHashMap. Can anyone point to some link or help me to ease the implementation.

I did something similar recently with ConcurrentHashMap<String,CacheEntry>, where CacheEntry wraps the actual item and adds cache eviction statistics: expiration time, insertion time (for FIFO/LIFO eviction), last used time (for LRU/MRU eviction), number of hits (for LFU/MFU eviction), etc. The actual eviction is synchronized and creates an ArrayList<CacheEntry> and does a Collections.sort() on it using the appropriate Comparator for the eviction strategy. Since this is expensive, each eviction then lops off the bottom 5% of the CacheEntries. I'm sure performance tuning would help though.
In your case, since you're doing FIFO, you could keep a separate ConcurrentLinkedQueue. When you add an object to the ConcurrentHashMap, do a ConcurrentLinkedQueue.add() of that object. When you want to evict an entry, do a ConcurrentLinkedQueue.poll() to remove the oldest object, then remove it from the ConcurrentHashMap as well.
Update: Other possibilities in this area include a Java Collections synchronization wrapper and the Java 1.6 ConcurrentSkipListMap.

Have you tried using one of the many caching solutions like ehcache?
You could try using LinkedHashMap with a ReadWriteLock. This would give you concurrent read access.

This might seem old now, but at least just for my own history tracking, I'm going to add my solution here: I combined ConcurrentHashMap that maps K->subclass of WeakReference, ConcurrentLinkedQueue, and an interface that defines deserialization of the value objects based on K to run LRU caching correctly. The queue holds strong refs, and the GC will evict the values from memory when appropriate. Tracking the queue size involved AtomicInteger, as you can't really inspect the queue to determine when to evict. The cache will handle eviction from/adding to the queue, as well as map management. If the GC evicted the value from memory, the implementation of the deserialization interface will handle retrieving the value back. I also had another implementation that involved spooling to disk/re-reading what was spooled, but that was a lot slower than the solution I posted here, as Ihad to synchronize spooling/reading.

You mention wanting to solve scalability problems with a "thread-safe" alternative. "Thread safety" here means that the structure is tolerant of attempts at concurrent access, in that it won't suffer corruption by concurrent use without external synchronization. However, such tolerance does not necessarily help to improve "scalability". In the simplest -- though usually misguided -- approach, you'll try to synchronize your structure internally and still leave non-atomic check-then-act operations unsafe.
LRU caches require at least some awareness of the total structure. They need something like a count of the members or the size of the members to decide when to evict, and then they need to be able to coordinate the eviction with concurrent attempts to read, add, or remove elements. Trying to reduce the synchronization necessary for concurrent access to the "main" structure fights against your eviction mechanism, and forces your eviction policy to be less precise in its guarantees.
The currently accepted answer mentions "when you want to evict an entry". Therein lies the rub. How do you know when you want to evict an entry? Which other operations do you need to pause in order to make this decision?

The moment you use another data structure along with concurrenthashmap, the atomicity of the operations sucha adding a new item in concurrenthashmap and adding in other data structure cant be guaranteed without additional synchronization such as ReadWriteLock which will degrade performance

Wrap the map in a Collections.synchronizedMap(). If you need to call additional methods, then synchronize on the map that you got back from this call, and invoke the original method on the original map (see the javadocs for an example). The same applies when you iterate over the keys, etc.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.