If I have a variable from which multiple threads read and only one thread writes, do I need to have a lock around that variable? Would it crash if one thread tries to read and the other thread tries to write at the same time?
The concurrency concern is not crashing, but what version of the data you're seeing.
if the shared variable is written atomically, it's possible for one (reader) thread to read a stale value when you thought your (writer) thread had updated the variable. You can use volatile keywords to prevent your reader threads from reading a stale value in this situation.
if the write operation is not atomic (for example if it's a composite object of some kind and you're writing bits of it at a time, while the other threads could theoretically be reading it) then your concern would also be that some reader threads could see the variable in an inconsistent state. You'd prevent this by locking access to the variable while it was being written (slow) or making sure that you were writing atomically.
Writes to some types of fields are atomic but without a happens-before relationship that ensures correct memory ordering (unless you use volatile); see this page for details.
The simple answer is yes, you need synchronization.
If you ever write to a field and read it from anywhere else without some form of synchronization, your program can see inconsistent state and is likely wrong. Your program will not crash but can see either the old or new or (in the case of longs and doubles) half old and half new data.
When I say "some form of synchronization" though, I more precisely mean something that creates a "happens-before" relationship (aka memory barrier) between the write and read locations. Synchronization or java.util.concurrent.lock classes are the most obvious way to create such a thing, but all of the concurrent collections typically also provide similar guarantees (check the javadoc to be sure). For example, doing a put and take on a concurrent queue will create a happens-before relationship.
Marking a field as volatile prevents you from seeing inconsistent references (long-tearing) and guarantees that all threads will "see" a write. But volatile fields writes/reads cannot be combined with other operations in larger atomic units. The Atomic classes handle common combo ops like compare-and-set or read-and-increment. Synchronization or other java.util.concurrent synchronizers (CyclicBarrier, etc) or locks should be used for larger areas of exclusivity.
Departing from the simple yes, there are cases that are more "no, if you really know what you're doing". Two examples:
1) The special case of a field that is final and written ONLY during construction. One example of that is when you populate a pre-computed cache (think of a Map where keys are well-known values and values are pre-computed derived values). If you build that in a field prior to construction and the field is final and you never write to it later, the end of the constructor performs "final field freeze" and subsequent reads DO NOT need to synchronize.
2) The case of the "racy single check" pattern which is covered in Effective Java. The canonical example is in java.lang.String.hashCode(). String has a hash field that is lazily computed the first time you call hashCode() and cached into the local field, which is NOT synchronized. Basically, multiple threads may race to compute this value and set over other threads, but because it is guarded by a well-known sentinel (0) and always computes the identical value (so we don't care which thread "wins" or whether multiple do), this actually is guaranteed to be ok.
A longer reference (written by me): http://refcardz.dzone.com/refcardz/core-java-concurrency
Be aware that volatile is NOT atomic, which means that double and long which use 64 bits can be read in an inconsistent state where 32 bits are the old value and 32 bits are the new value. Also, volatile arrays do not make the array entries volatile. Using classes from java.util.concurrent is strongly recommended.
Related
Marking a variable as volatile in Java ensures that every thread sees the value that was last written to it instead of some stale value. I was wondering how this is actually achieved. Does the JVM emit special instructions that flush the CPU cashes or something?
From what I understand it always appears as if the cache has been flushed after write, and always appears as if reads are conducted straight from memory on read. The effect is that a Thread will always see the results of writes from another Thread and (according to the Java Memory Model) never a cached value. The actual implementation and CPU instructions will vary from one architecture to another however.
It doesn't guarantee correctness if you increment the variable in more than one thread, or check its value and take some action since obviously there is no actual synchronization. You can generally only guarantee correct execution if there is only just Thread writing to the variable and others are all reading.
Also note that a 64 bit NON-volatile variable can be read/written as two 32 bit variables, so the 32 bit variables are atomic on write but the 64 bit ones aren't. One half can be written before another - so the value read could be nether the old or the new value.
This is quite a helpful page from my bookmarks:
http://www.cs.umd.edu/~pugh/java/memoryModel/
Exactly what happens is processor-specific. Generally there are some form of memory barrier instructions. Flushing the entire cache would obviously be very expensive - there are cache coherency protocols in the hardware.
Also important, is that certain optimisations are not made across the field accesses. The compiler is important when considering multithreading, don't just think about the hardware.
To use intrinsic locking in Java you do
Object o = new Object()
...
sychronized (o) {
...
}
So one monitor already requires one Object i.e. 8 bytes or 16 bytes for 64bit (or 12 bytes for compressed ops and 64bit).
Now assume you want to use lots of these monitors e.g. for an array which one can synchronize over certain areas and that has better concurrency (entry based) than Collections.synchronizedList. Then what is the most efficient way to implement this? Could I somehow use 2 nested locks for 4 entries or 3 for 8 etc? Or could I make use of "one lock per thread" e.g. in a ConcurrentHashMap<array_index, lock>?
Depending on access patterns, you might increase concurrency with fewer locks by segmenting your data structure and using a single intrinsic lock to guard multiple elements. This technique is used in some of the concurrent collections provided in the java.util.concurrent package.
"Could I somehow use 2 nested locks for 4 entries or 3 for 8 etc?" It sounds like you are planning to treat each lock like a bit in the entry index: if the bit is set, acquire the lock; if it's clear, skip it. This won't work. Think about index 0. No locks would be acquired and you'd have no concurrency control.
You could make it "work" by doubling the number of locks (have a "set" and "clear" lock for each bit), but it's still a bad idea because you'd be wasting locks and getting really poor concurrency. The outermost lock would guard half the entries. Any nested locks acquired subsequently would be useless, because other threads are already excluded from that segment.
That takes you back to segmenting your data, with one lock per segment, just like java.util.concurrency does.
To get a monitor, you need an object, so to get the functionality you're after, i.e. locking a set of primitive values, you need an object for the set.
Rather than creating an array of values and treating blocks of values as a set, with a separate Object for the monitor, just create objects for the set of values.
It's the OO way.
I know that reading from a single object across multiple threads is safe in Java, as long as the object is not written to. But what are the performance implications of doing that instead of copying the data per thread?
Do threads have to wait for others to finish reading the memory? Or is the data implicitly copied (the reason of existence of volatile)? But what would that do for the memory usage of the entire JVM? And how does it all differ when the object being read is older than the threads that read it, instead of created in their lifetime?
If you know that an object will not change (e.g. immutable objects such as String or Integer) and have, therefore, avoided using any of the synchronization constructs (synchronized, volatile), reading that object from multiple threads does not have any impact on performance. All threads will access the memory where the object is stored in parallel.
The JVM may choose, however, to cache some values locally in each thread for performance reasons. The use of volatile forbids just that behaviour - the JVM will have to explicitly and atomically access a volatile field each and every time.
If data is being read there is no implication because multiple threads can access the same memory concurrently. Only when writing occurs because of locking mechanisms will you receive a performance hit. Note on volatile (cant remember if its the same in Java as C) but its used for data that can change from underneath the program (like direct addressing of data in c) or if you want atomicity for your data. Copying the data would not make a difference in performance but would use more memory.
To have a shared state between multiple threads - you'll have to coordinate access to it using some synchronization mechanism - volatile, synchronization, cas. I'm not sure what you expect to hear on "performance implication" - it will depend on the concrete scenario and context. In general you will be paying some price for having to coordinate access to the shared object by multiple threads.
How is the performance of ConcurrentHashMap compared to HashMap, especially .get() operation (I'm especially interested for the case of only few items, in the range between maybe 0-5000)?
Is there any reason not to use ConcurrentHashMap instead of HashMap?
(I know that null values aren't allowed)
Update
just to clarify, obviously the performance in case of actual concurrent access will suffer, but how compares the performance in case of no concurrent access?
I was really surprised to find this topic to be so old and yet no one has yet provided any tests regarding the case. Using ScalaMeter I have created tests of add, get and remove for both HashMap and ConcurrentHashMap in two scenarios:
using single thread
using as many threads as I have cores available. Note that because HashMap is not thread-safe, I simply created separate HashMap for each thread, but used one, shared ConcurrentHashMap.
Code is available on my repo.
The results are as follows:
X axis (size) presents number of elements written to the map(s)
Y axis (value) presents time in milliseconds
The summary
If you want to operate on your data as fast as possible, use all the threads available. That seems obvious, each thread has 1/nth of the full work to do.
If you choose a single thread access use HashMap, it is simply faster. For add method it is even as much as 3x more efficient. Only get is faster on ConcurrentHashMap, but not much.
When operating on ConcurrentHashMap with many threads it is similarly effective to operating on separate HashMaps for each thread. So there is no need to partition your data in different structures.
To sum up, the performance for ConcurrentHashMap is worse when you use with single thread, but adding more threads to do the work will definitely speed-up the process.
Testing platform
AMD FX6100, 16GB Ram
Xubuntu 16.04, Oracle JDK 8 update 91, Scala 2.11.8
Thread safety is a complex question. If you want to make an object thread safe, do it consciously, and document that choice. People who use your class will thank you if it is thread safe when it simplifies their usage, but they will curse you if an object that once was thread safe becomes not so in a future version. Thread safety, while really nice, is not just for Christmas!
So now to your question:
ConcurrentHashMap (at least in Sun's current implementation) works by dividing the underlying map into a number of separate buckets. Getting an element does not require any locking per se, but it does use atomic/volatile operations, which implies a memory barrier (potentially very costly, and interfering with other possible optimisations).
Even if all the overhead of atomic operations can be eliminated by the JIT compiler in a single-threaded case, there is still the overhead of deciding which of the buckets to look in - admittedly this is a relatively quick calculation, but nevertheless, it is impossible to eliminate.
As for deciding which implementation to use, the choice is probably simple.
If this is a static field, you almost certainly want to use ConcurrentHashMap, unless testing shows this is a real performance killer. Your class has different thread safety expectations from the instances of that class.
If this is a local variable, then chances are a HashMap is sufficient - unless you know that references to the object can leak out to another thread. By coding to the Map interface, you allow yourself to change it easily later if you discover a problem.
If this is an instance field, and the class hasn't been designed to be thread safe, then document it as not thread safe, and use a HashMap.
If you know that this instance field is the only reason the class isn't thread safe, and are willing to live with the restrictions that promising thread safety implies, then use ConcurrentHashMap, unless testing shows significant performance implications. In that case, you might consider allowing a user of the class to choose a thread safe version of the object somehow, perhaps by using a different factory method.
In either case, document the class as being thread safe (or conditionally thread safe) so people who use your class know they can use objects across multiple threads, and people who edit your class know that they must maintain thread safety in future.
I would recommend you measure it, since (for one reason) there may be some dependence on the hashing distribution of the particular objects you're storing.
The standard hashmap provides no concurrency protection whereas the concurrent hashmap does. Before it was available, you could wrap the hashmap to get thread safe access but this was coarse grain locking and meant all concurrent access got serialised which could really impact performance.
The concurrent hashmap uses lock stripping and only locks items that affected by a particular lock. If you're running on a modern vm such as hotspot, the vm will try and use lock biasing, coarsaning and ellision if possible so you'll only pay the penalty for the locks when you actually need it.
In summary, if your map is going to be accesaed by concurrent threads and you need to guarantee a consistent view of it's state, use the concurrent hashmap.
In the case of a 1000 element hash table using 10 locks for whole table saves close to half the time when 10000 threads are inserting and 10000 threads are deleting from it.
The interesting run time difference is here
Always use Concurrent data structure. except when the downside of striping (mentioned below) becomes a frequent operation. In that case you will have to acquire all the locks? I read that the best ways to do this is by recursion.
Lock striping is useful when there is a way of breaking a high contention lock into multiple locks without compromising data integrity. If this is possible or not should take some thought and is not always the case. The data structure is also the contributing factor to the decision. So if we use a large array for implementing a hash table, using a single lock for the entire hash table for synchronizing it will lead to threads sequentially accessing the data structure. If this is the same location on the hash table then it is necessary but, what if they are accessing the two extremes of the table.
The down side of lock striping is it is difficult to get the state of the data structure that is affected by striping. In the example the size of the table, or trying to list/enumerate the whole table may be cumbersome since we need to acquire all of the striped locks.
What answer are you expecting here?
It is obviously going to depend on the number of reads happening at the same time as writes and how long a normal map must be "locked" on a write operation in your app (and whether you would make use of the putIfAbsent method on ConcurrentMap). Any benchmark is going to be largely meaningless.
It's not clear what your mean. If you need thread safeness, you have almost no choice - only ConcurrentHashMap. And it's definitely have performance/memory penalties in get() call - access to volatile variables and lock if you're unlucky.
Of course a Map without any lock system wins against one with thread-safe behavior which needs more work.
The point of the Concurrent one is to be thread safe without using synchronized so to be faster than HashTable.
Same graphics would would be very interesting for ConcurrentHashMap vs Hashtable (which is synchronized).
The Java AtomicInteger class has a method -
boolean weakCompareAndSet(int expect,int update)
Its documnentation says:
May fail spuriously.
What does 'failing spuriously' here mean?
spuriously: for no apparent reason
According to atomic package javadoc:
The atomic classes also support method weakCompareAndSet, which has limited applicability.
On some platforms, the weak version may be more efficient than compareAndSet in the normal case, but differs in that any given invocation of the weakCompareAndSet method may return false spuriously (that is, for no apparent reason).
A false return means only that the operation may be retried if desired, relying on the guarantee that repeated invocation when the variable holds expectedValue and no other thread is also attempting to set the variable will eventually succeed.
(Such spurious failures may for example be due to memory contention effects that are unrelated to whether the expected and current values are equal.)
Additionally weakCompareAndSet does not provide ordering guarantees that are usually needed for synchronization control.
According to this thread, it is not so much because of "hardware/OS", but because of the underlying algorithm used by weakCompareAndSet :
weakCompareAndSet atomically sets the value to the given updated value if the current value == the expected value. May fail spuriously.
Unlike compareAndSet(), and other operations on an AtomicX, the weakCompareAndSet() operation does not create any happens-before orderings.
Thus, just because a thread sees an update to an AtomicX caused by a weakCompareAndSet doesn't mean it is properly synchronized with operations that occurred before the weakCompareAndSet().
You probably don't want to use this method, but instead should just use compareAndSet; as there are few cases where weakCompareAndSet is faster than compareAndSet, and there are a number of cases in which trying to optimizing your code by using weakCompareAndSet rather than compareAndSet will introduce subtle and hard to reproduce synchronization errors into your code.
Note regarding happens-before orderings:
The Java Memory Model (JMM) defines the conditions under which a thread reading a variable is guaranteed to see the results of a write in another thread.
The JMM defines an ordering on the operations of a program called happens-before.
Happens-before orderings across threads are only created by synchronizing on a common lock or accessing a common volatile variable.
In the absence of a happens-before ordering, the Java platform has great latitude to delay or change the order in which writes in one thread become visible to reads of that same variable in another.
It means it might return false (and will not set the new value) even if it currently contains the expected value.
In other words, the method may do nothing and return false for no apparent reason...
There are CPU architectures where this may have a performance advantage over a strong CompareAndSet().
A bit more concrete detail on why something like this might happen.
Some architectures (like newer ARMs) implement CAS operations using a Load Linked (LL)/Store Conditional (SC) set of instructions. The LL instruction loads the value in a memory location and 'remembers' the address somewhere. The SC instruction stores a value into that memory location if the value at the remembered address has not been modified. It's possible for the hardware to believe that the location has been modified even if it apparently hasn't for a number of possible reasons (and the reasons might vary by CPU architecture):
the location may have been written with the same value
the resolution of the addresses watched might not be exactly the one memory location of interest (think cache lines). A write to another location that's 'close-by' may cause the hardware to flag the address in question as 'dirty'
a number of other reasons that may cause the CPU to lose the saved state of the LL instruction - context switches, cache flushes, or page table changes maybe.
A good use-case for weakCompareAndSet is performance counters - no need for ordering, high rate of updates (so ordering hurts on weakly ordered systems), but will not drop counts under high loads (tightly contented perf-counters can drop 99% of all counts, essentially leaving the counters' value relative to un-contended counters random).