Directly from the ConcurrentHashMap javadocs:
The allowed concurrency among update operations is guided by the
optional concurrencyLevel constructor argument (default 16), which is
used as a hint for internal sizing. The table is internally
partitioned to try to permit the indicated number of concurrent
updates without contention. Because placement in hash tables is
essentially random, the actual concurrency will vary.
I do not get the point when they say: "which is used as a hint for internal sizing". Should not the sizing be determing by the capacity and not by the concurrencyLevel?
the table is internally partitioned - this means they divide up the table into ConcurrencyLevel divisions in the hopes that they can do that many concurrent updates without any collisions.
there is no guarantee however that the hashed data will fall nicely in those partitions.
Related
I see how Java's AtomicInteger works internally with CAS (Compare And Swap) operation. Basically when multiple threads try to update the value, JVM internally use the underlying CAS mechanism and try to update the value. If the update fails, then try again with the new value but never blocks.
In Java8 Oracle introduced a new Class LongAdder which seems to perform better than AtomicInteger under high contention. Some blog posts claim that LongAdder perform better by maintaining internal cells - does that mean LongAdder aggregates the values internally and update it later? Could you please help me to understand how LongAdder works?
does that mean LongAdder aggregates the values internally and update it later?
Yes, if I understand your statement correctly.
Each Cell in a LongAdder is a variant of an AtomicLong. Having multiple such cells is a way of spreading out the contention and thus increasing throughput.
When the final result (sum) is to be retrieved, it just adds together the values of each cell.
Much of the logic around how the cells are organized, how they are allocated etc can be seen in the source: http://hg.openjdk.java.net/jdk9/jdk9/jdk/file/f398670f3da7/src/java.base/share/classes/java/util/concurrent/atomic/Striped64.java
In particular the number of cells is bound by the number of CPUs:
/** Number of CPUS, to place bound on table size */
static final int NCPU = Runtime.getRuntime().availableProcessors();
The primary reason it is "faster" is its contended performance. This is important because:
Under low update contention, the two classes have similar characteristics.
You'd use a LongAdder for very frequent updates, in which atomic CAS and native calls to Unsafe would cause contention. (See source and volatile reads). Not to mention cache misses/false sharing on multiple AtomicLongs (although I have not looked at the class layout yet, there doesn't appear to be sufficient memory padding before the actual long field.
under high contention, expected throughput of this class is significantly higher, at the expense of higher space consumption.
The implementation extends Striped64, which is a data holder for 64-bit values. The values are held in cells, which are padded (or striped), hence the name. Each operation made upon the LongAdder will modify the collection of values present in the Striped64. When contention occurs, a new cell is created and modified, so the the old thread can finish concurrently with contending one. When you need the final value, the sums of each cell is simply added up.
Unfortunately, performance comes with a cost, which in this case is memory (as often is). The Striped64 can grow very large if a large load of threads and updates are being thrown at it.
Quote source:
Javadoc for LongAdder
Atomic Long uses CAS which - under heavy contention can lead to many wasted CPU cycles.
LongAdder, on the other hand, uses a very clever trick to reduce contention between threads, when these are incrementing it.
So when we call increment() , behind the scenes LongAdder maintains an array of counter that can grow on demand.
And so, when more threads are calling increment(), the array will be longer. Each record in the array can be updated separately – reducing the contention. Due to that fact, the LongAdder is a very efficient way to increment a counter from multiple threads.
The result of the counter in the LongAdder is not available until we call the sum() method.
I want to use a Comparator based key value Map. This will have reads and a rare write operation (once every 3 months through a scheduler). The initial load of the collection will be done at application startup.
Also note that the write will:
Add a single entry to the Map
Will not modify any existing entry to the Map.
Will ConcurrentSkipListMap be a good candidate for this. Is the get operation on this allows access to multiple threads simultaneously? I'm looking for concurrent non blocking read but atomic write.
ConcurrentHashMap is exactly what you're looking for. From the Javadoc:
Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.)
That sounds like it satisfies your requirement for "concurrent non blocking read but atomic write".
Since you're doing so few writes, you may want to specify a high loadFactor and appropriate initialSize when creating the ConcurrentHashMap, which will prevent table resizing as you're populating the map, though this is a modest benefit at best. (You could also set a concurrencyLevel of 1, though Java 8's Javadoc seems to imply that is no longer used as a sizing hint.)
If you absolutely must have a SortedMap or NavigableMap, then ConcurrentSkipListMap is the out-of-the-box way to go. But I would double-check that you actually need the functionality provided by those interfaces (getting the first/last key, submaps, finding nearby entries, etc.) before using them. You will pay a steep price (log n vs. constant time for most operations).
Since you are looking for concurrent operations you have basically 3 competitors.
Hashtable, ConcurrentHashMap, ConcurrentSkipListMap (or Collections.synchronizedMap() but that's not efficient).
Out of these 3 latter 2 are more suitable for concurrent operation as they just lock the portion of map rather than locking the entire map like Hashtable.
Out of latter 2 SkipListMap uses skip list data structure which ensures average O(log n) performance for fast search and variety of operations.
It also offers number of operations that ConcurrentHashMap can't, i.e. ceilingEntry/Key(), floorEntry/Key(), etc. It also maintains a sort order which would otherwise have to be calculated.
Thus if you had asked only for faster search i'd have suggested ConcurrentHashMap, but since you have also mentioned 'rare write operations' and 'desired sorting' order I think ConcurrentSkipListMap wins the race.
If you are willing to try third party code, you could consider a copy-on-write version of maps, which are ideal for infrequent writes. Here's one that came up via Googling:
https://bitbucket.org/atlassian/atlassian-util-concurrent/wiki/CopyOnWrite%20Maps
Never tried it myself so caveat emptor.
I created a `ConcurrentHashMap with following values :
ConcurrentHashMap<String,String> concurrentHashMap = new ConcurrentHashMap<>(10,.9F,1);
Above means only 1 thread can update the map at a given point of time. If this is the case then can I say that it will work like HashMap in case of concurrency .i.e.; only one write operation will be performed at a given point of time.
Is my understanding correct or am I missing something here?
The concurrencyLevel is just a hint, to help size internal data structures. There's no guarantee 1 would be the actual value used, and rather than behaving as a regular HashMap it may mean it would be less efficient to use if you actually used it from more than 1 thread.
From the Javadoc:
Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention.
Your question is actually about the concurrencyLevel:
concurrencyLevel - the estimated number of concurrently updating threads. The implementation may use this value as a sizing hint.
Basically, a ConcurrentHashMap is chunked into segments. Each segment can only be modified by one thread at a time. Simply put, the more segments you have, the more concurrency you get. Yet you also end up using much more memory because each segment has its own memory overhead.
Therefore if you know that only one thread will access your map, setting the concurrencyLevel to 1 will only create 1 segment in the map, thus making it more memory-efficient.
If the value is too high, more memory will be used and some time will be used finding the right segment for every object you want to read/write in the map.
From the Javadocs of ConcurrentHashMap :
The allowed concurrency among update operations is guided by the
optional concurrencyLevel constructor argument (default 16), which is
used as a hint for internal sizing.
I do not understand the part that says "which is used as a hint for internal sizing." . What does this mean ? What is the best practice for setting this value and what guarantee does it give us ?
Take a look at the very next sentences in the Javadoc:
The table is internally partitioned to try to permit the indicated
number of concurrent updates without contention. Because placement
in hash tables is essentially random, the actual concurrency will
vary. Ideally, you should choose a value to accommodate as many
threads as will ever concurrently modify the table. Using a
significantly higher value than you need can waste space and time,
and a significantly lower value can lead to thread contention. But
overestimates and underestimates within an order of magnitude do
not usually have much noticeable impact. A value of one is
appropriate when it is known that only one thread will modify and
all others will only read. Also, resizing this or any other kind of
hash table is a relatively slow operation, so, when possible, it is
a good idea to provide estimates of expected table sizes in
constructors.
So in other words, a concurencyLevel of 16 means that the ConcurrentHashMap internally creates 16 separate hashtables in which to store data. Operations that modify data in one hashtable do not require locking the other hashtables, which allows somewhat-concurrent access to the overall Map.
You might want to try reading the source of ConcurrentHashMap.
Concurrency level is around equal how many operations on map can be invoked concurrently without using internal locking mechanism. As maat b is saying that ConcurrentHashMap will have N internal hashtables and thus operations which are working on different hashtables doesn't require additional locking - otherwise if operations are working on the same internal hashtable then ConcurrenyHashMap uses additional internal locking on them.
Is there some optimal value for ConcurrencyLevel beyond which ConcurrentHashMap's performance starts degrading?
If yes, what's that value, and what's the reason for performance degradation? (this question orginates from trying to find out any practical limitations that a ConcurrentHashMap may have).
The Javadoc offers pretty detailed guidance:
The allowed concurrency among update operations is guided by the optional concurrencyLevel constructor argument (default 16), which is used as a hint for internal sizing.
The table is internally partitioned to try to permit the indicated number of concurrent updates without contention. Because placement in hash tables is essentially random, the actual concurrency will vary. Ideally, you should choose a value to accommodate as many threads as will ever concurrently modify the table. Using a significantly higher value than you need can waste space and time, and a significantly lower value can lead to thread contention. But overestimates and underestimates within an order of magnitude do not usually have much noticeable impact. A value of one is appropriate when it is known that only one thread will modify and all others will only read.
To summarize: the optimal value depends on the number of expected concurrent updates. A value within an order of magnitude of that should work well. Values outside that range can be expected to lead to performance degradation.
You have to ask yourself two questions
how many cpus do I have?
what percentage of the time will a useful program be accessing the same map?
The first question tells you the maximum number of threads which can access the map at once. You can have 10000 threads, but if you have only 4 cpus, at most 4 will be running at once.
The second question tells you the most any of those threads will be accessing the map AND doing something useful. You can optimise the map to do something useless (e.g. a micro-benchmark) but there is no point tuning for this IMHO. Say you have a useful program which uses the map a lot. It might be spending 90% of the time doing something else e.g. IO, accessing other maps, building keys or values, doing something with the values it gets from the map.
Say you spend 10% of the time accessing a map on a machine with 4 CPUs. This means on average you will be accessing the map in 0.4 threads on average. (Or one thread about 40% of the time) In this case a concurrency level of 1-4 is fine.
In any case, making the concurrency level higher than the number of cpus you have is likely to be unnecessary, even for a micro-benchmark.
As of Java 8, ConcurrentHashMap's constructor parameter for concurrencyLevel is effectively unused, and remains primarily for backwards-compatibility. The implementation was re-written to use the first node within each hash bin as the lock for that bin, rather than a fixed number of segments/stripes as was the case in earlier versions.
In short, starting in Java 8, don't worry about setting the concurrencyLevel parameter, as long as you set a positive (non-zero, non-negative) value, per the API contract.