I have been reading the book called art of multiprocessor programming and came across functions such as get(), getandset(), compareandset(), getandIncrease(), getandIncrease() etc.
In the book it says that all the above function are atomic and I agree but I had my own doubts on how some function becomes a atomic function.
Why does the function with get or compare become atomic ? - because it has to wait till it gets the value or waits till some condition becomes true which creates a barrier, hence atomic.
Am I right in thinking this way? is there any thing that I have missed ?
When I do
if (tail_index.get() == (head_index.getAndIncrement())
is this atomic?
A method is made atomic relative to some instance by adding explicit thread-safety. In many cases this is done by marking the method as synchronized. There is not magic, if you look at the source code of the thread-safe class that claims that methods are atomic, you will see the locking.
WRT to your second part, No it is not atomic. Each method call is atomic but when you put two together the combination is not atomic. get and getAndIncrement have been explicitly made atomic. Once you add other code (or a combination of the calls) it is not atomic unless you make it so.
A function is atomic if it appears to occur instantaneously.[1]
Here, "appears to" means from the point of view of the rest of the system. For instance, consider a synchronized function that reverses a linked list. To an outside observer, the operation clearly does not occur instantaneously: it takes many reads and writes to update all the list pointers. However, as a lock is held the entire time, no other part of the system may read the list during this time, so to them, the update appears instantaneous.
Equally, CAS (compare-and-set) operations do not actually occur instantly on modern computers. It takes time for one CPU core to obtain exclusive write access to the value, and then it takes more time for another core to re-obtain read access afterwards to see the new value. During this time, the CPU is executing other instructions in parallel. To ensure the illusion of instantaneous execution is preserved, the JVM issues CPU instructions before and after the CAS operation to ensure no logically subsequent reads get pulled up and executed before the CAS finishes (which would allow you to read a part of the linked list before you had actually taken the lock, for instance), and that no logically preceding writes get delayed and executed after the CAS finishes (which would allow another thread to take the lock before the linked list was completely updated).
These CPU ordering instructions are the key difference between AtomicInteger.compareAndSet and AtomicInteger.weakCompareAndSet (the "may fail spuriously" bit is easily rectified with a loop). Without the ordering guarantees, the weak CAS operation cannot be used to implement most concurrent algorithms, and "is only rarely an appropriate alternative to compareAndSet".
If this is sounding complicated...well...it is! Which is why you can still get a PhD by designing a concurrent algorithm. To show correctness for a concurrent algorithm, you have to consider what every other thread may possibly be doing to mess you around. It may help if you think of them as adversaries, trying to break the illusion of atomicity. For instance, let's consider your example:
if (tail_index.get() == (head_index.getAndIncrement()))
I assume this is part of a method to pop an item off a stack implemented as a cyclic array with index counters, and execute the body of the "if" if the stack is now empty. As head_index and tail_index are being accessed separately, your adversary can "divide" them with as many operations as he likes. (Imagine, for instance, that your thread is interrupted by the OS between the get and the getAndIncrement.) So it would be easy for him to add dozens of items to the stack, then remove all but one, leaving head_index above tail_index; your if block will then never execute, even though you are removing the last item on the stack.
So, when your book says get(), getAndSet(), etc. are atomic, it is not making a general statement about any possible implementation of those methods. It's telling you that the Java standard guarantees that they are atomic, and does so by careful use of the available CPU instructions, in a way that would be impossible to do in plain Java (synchronized lets you emulate it, but is more costly).
No, function, using get() is not atomic. But, for example, getAndIncrement or compareAndSet are atomic themselves. That means that it guaranteed, that all the logic is made atomically. For get() there is one another assurance: when you publish atomic value into one thread, it immediately becomes visible to another threads (just like volatile fields). Non-volatile and non-atomic values dont: there are cases, where value being set to non-volatile fiels is not visible to another threads; these threads get an old value reading field's value.
But you always can write atomic function using Atomic* classes and other synchonization primitives.
Related
While going through Java Concurrency in practice by Brian Goetz I encountered the following line:
A data race occurs when a variable is read by more than one thread,
and written by at least one thread, but the reads and writes are not
ordered by happens-before. A correctly synchronized program is one
with no data races; correctly synchronized programs exhibit sequential
consistency, meaning that all actions within the program appear to
happen in a fixed, global order.
My Question is that, Is Out of Order writes the only reason for Data Race condition in java or possibly in other programming languages?
UPDATE
OK, I did some more investigation about data-race and found the following from oracle official site which says that :
The Thread Analyzer detects data-races that occur during the execution
of a multi-threaded process. A data race occurs when:
two or more threads in a single process access the same memory location concurrently, and
at least one of the accesses is for writing, and
the threads are not using any exclusive locks to control their accesses to that memory.
When these three conditions hold, the order of accesses is
non-deterministic, and the computation may give different results from
run to run depending on that order. Some data-races may be benign (for
example, when the memory access is used for a busy-wait), but many
data-races are bugs in the program.
In this part, it is mentioning that : the order of accesses is non-deterministic
Is it talking about the the sequence in which Threads are accessing the memory location? If yes, then synchronization never guarantee about the order in which threads will access the block of code. So , how synchronization can resolve the issue of data race?
I would rather define data race as
Data race between writing and reading of some value or reference from a variable is a situation when the result of reading is determined by the "internal" (jvm- or os-controlled) thread scheduling.
In fact, second definition from the question says the same in more "official" words :)
In the other words, consider thread A writing some value to the variable and thread B attempting to read it. If you miss any kind of synchronization (or other mechanism that can provide happens-before guarantees between write and subsequent read), your program has a data race between threads A and B.
Now, to your question:
Is it talking about the the sequence in which Threads are accessing the memory location? If yes, then synchronization never guarantee about the order in which threads will access the block of code.
Synchronization in that particular case guarantees that you will never be able to read value that variable had before the writer thread written new value after writer thread exited synchronized block or method. Without syncronization, there is a chance to read old value even after write is actually happened.
About the order of access: it is going to be deterministic with synchronization in the following way:
Let's take a look at our threads A and B again. The operations order is now sequential - thread B will not be able to start reading until thread A finished with writing. To get this situation clear, imagine that writing and reading is really a long process. Without synchronization, these operations will be able to interlap with each other which might result in some meaningless values read.
In the API documents, we can see:
If multiple threads access a hash map concurrently, and at least one of the threads modifies the map structurally, it must be
synchronized externally. (A structural modification is any operation
that adds or deletes one or more mappings; merely changing the value
associated with a key that an instance already contains is not a
structural modification.)
I'm thinking if the "put" method should be synchronized ? It said only the structural modification. Can you give some unsafe cases for the HashMap. And when I view the source code of "HashTable", the "get" method is also been synchronized, why not only the write operations be synchronized?
There is a general rule of thumb:
If you have more than one thread accessing a collection and at least one thread modifies the collection at some point, you need to synchronize all accesses to the collection.
If you think about it, its very clear: If a collection is modified while another thread reads from it (e.g. iterates), read and write operation can interfere with each other (the read seeing a partial write, e.g. entry created but value not yet set or entry not properly linked yet).
Exempt from this are collections one thread creates and modifies, then hands of to "the world" but never modifies them after publishing their reference.
why not only the write operations be synchronized?
If the reads are not synchronized as well, you might encounter visibility issues. Not only that, but it is also possible to completely thrash the object, if it performs structural changes!
The JVM specification gives a few guarantees regarding when modifications to a memory location made by one thread will be visible to other threads. One such guarantee is that modifications by a thread prior to releasing a lock are visible to threads that subsequently acquire the same lock. That's why you need to synchronized the read operations as well, even in the absence of concurrent structural modifications to the object.
Note that this releasing/acquiring locks is not the only way to guarantee visibility of memory modifications, but it's the easiest. Others include order of starting threads, class initialization, reads/writes to memory locations... more sophisticated stuff (and possibly more scalable on a highly concurrent environment, due to a reduced level of contention).
If you don't use any of those other techniques to ensure visibility, then simply locking only on write operations is wrong code. You might or might not encounter visibility issues though -- there's no guarantee that the JVM will fail, but it's possible, so... wrong code.
I'd suggest you read the book "Java Concurrency in Practice", one of the best texts on the subject I've ever read, after the JVM spec itself. Obviously, the book is way easier (still far from trivial!) and more fun to read than the spec...
One example would be:
Thread 1:
Iterator<YourType> it = yourMapInstance.entrySet().iterator();
while(it.hasNext()) {
it.next().getValue().doSth();
Thread.sleep(1000);
}
}
Thread 2:
for(int i = 0; i < 10; i++) {
if(Math.random() < 0.5) {
yourMapInstance.clear();
Thread.sleep(500);
}
}
Now, if both threads are executed concurrently, at some point there might be a situation, that you have a value in your iterator, while the other thread has already deleted everything from the map. In this case, synchronization is required.
Ok, I know this may sound quite stupid (and I'm afraid it is), but I'm not completely satisfied with the answer I gave myself so I thought it was worth it asking it here.
I'm dealing with an exercise about concurrency (in Java) which goes like this
Given a solved Sudoku chart, determine, using a fixed number of threads running at the same time, whether the chart has been correctly solved, i.e. no violation of the canonical rules occur (a number must appear within its row, its column, and its block only once).
Now my question is: since the threads only have to perform "reads", gathering infos from the chart and elaborating them somewhere else, couldn't they work without worrying about concurrency? Chart's state is always consistent since no "writes" are performed, hence it's never changed.
Aren't locks/synchronized blocks/synchronized methods necessary if and only if there's a risk for resources' consistency to be lost? In other words, did I understand concurrency the right way?
This is a fairly subtle question, not stupid at all.
Multiple threads that are reading a data structure concurrently may do so without synchronization, only if the data structure has been safely published. This is memory visibility issue, not a timing issue or race condition.
See section 3.5 of Goetz, et. al., Java Concurrency In Practice, for further discussion of the concept of safe publication. Section 3.5.4 on "Effectively Immutable Objects" seems applicable here, as the board becomes effectively immutable at a certain point, because it is never written to after it has reached the solved state.
Briefly, the writer threads and the reader threads must perform some memory-coordinating activity to ensure that the reader threads have a consistent view of what has been written. For example, the writer thread could write the sudoku board and then, while holding a lock, store a reference to the board in a static field. The reading threads could then load that reference, while holding the lock. Once they've done that, they are assured that all previous writes to the board are visible and consistent. After that, the reader threads may access the board structure freely, with no further synchronization.
There are other ways to coordinate memory visibility, such as writes/reads to a volatile variable or an AtomicReference. Use of higher-level concurrency constructs, such as latches or barriers, or submitting tasks to an ExecutorService, will also provide memory visibility guarantees.
UPDATE
Based on an exchange in the comments with Donal Fellows, I should also point out that the safe publication requirement also applies when getting results back from the reader threads. That is, once one of the reader threads has a result from its portion of the computation, it needs to publish that result somewhere so that it can be combined with the other reader threads' results. The same techniques can be used as before, such as locking/synchronization over a shared data structure, volatiles, etc. However, this is usually not necessary, since the results can be obtained from a Future returned by ExecutorService.submit or invoke. These constructs handle the safe publication requirements automatically, so the application doesn't have to deal with synchronization.
In my opinion your understanding is correct. Data corruption can only happen if any of the threads is writing on the data.
If you're 100% sure that no thread is writing, then it's safe to skip synchronization and locking...
EDIT: skipping locking in theses cases is the best practice!
:)
No need of Synchronizing the file if it is read-only.Basically lock is applied to critical section.Critical section is ,where different threads accessing the shared memory concurrently.
Since Synchronization makes program slow as no multiple threads access at same time so better not to use lock in case of read-only files.
Imagine you have a bunch of work to complete (check 9 rows, 9 columns, 9 blocks). If you want threads to complete this bunch of 27 units of work and if you want to complete the work without double work, then the threads would need to be synchronized. If on the other hand, you are happy to have threads that may perform a work unit that has been done by another thread, then you don't need to synchronize the threads.
Scenario where Thread1 writes some data and then a bunch of threads need to read this data doesn't require locking if done properly. By properly I mean that your SUDOKU board is an immutable object, and by immutable object I mean:
State cannot be modified after construction
State is not actually modified via some reflection dark magic
All the fields are final
'this' reference does not escape during construction (this could happen if during construction you do something along the lines MyClass.instnce = this).
If you pass this object to the worker threads you are good to go. If your objects don't satisfy all these conditions you still may run into concurrency problems, in most cases it is due to the fact that JVM may reorder statements at will (for performance reasons), and it might reorder these statements in such a way that worker threads are launched before sudoku board was constructed.
Here is a very nice article about immutable objects.
Abstract
For a thread to be guaranteed to observe the effects of a write to main memory, the write must happen-before the read. If write and read occur in different threads, that requires a synchronization action. The spec defines many different kinds of synchronization actions. One such action is executing a synchronized statement, but alternatives exist.
Details
The Java Language Specification writes:
Two actions can be ordered by a happens-before relationship. If one action happens-before another, then the first is visible to and ordered before the second.
and
More specifically, if two actions share a happens-before relationship, they do not necessarily have to appear to have happened in that order to any code with which they do not share a happens-before relationship. Writes in one thread that are in a data race with reads in another thread may, for example, appear to occur out of order to those reads.
In your case, you want the reading threads to solve the right sudoku. That is, the initialization of the sudoku object must be visible to the reading threads, and therefore the initialization must happen-before the reading threads read from the sudoku.
The spec defines happens-before as follows:
If we have two actions x and y, we write hb(x, y) to indicate that x happens-before y.
If x and y are actions of the same thread and x comes before y in program order, then hb(x, y).
There is a happens-before edge from the end of a constructor of an object to the start of a finalizer (§12.6) for that object.
If an action x synchronizes-with a following action y, then we also have hb(x, y).
If hb(x, y) and hb(y, z), then hb(x, z).
Since reading occurs in a different thread than writing (and not in a finalizer), we therefore need a synchronization action to establish that the write happens-before the read. The spec gives the following exhaustive list of synchronization actions:
An unlock action on monitor m synchronizes-with all subsequent lock actions on m (where "subsequent" is defined according to the synchronization order).
A write to a volatile variable v (§8.3.1.4) synchronizes-with all subsequent reads of v by any thread (where "subsequent" is defined according to the synchronization order).
An action that starts a thread synchronizes-with the first action in the thread it starts.
The write of the default value (zero, false, or null) to each variable synchronizes-with the first action in every thread. (Although it may seem a little strange to write a default value to a variable before the object containing the variable is allocated, conceptually every object is created at the start of the program with its default initialized values.)
The final action in a thread T1 synchronizes-with any action in another thread T2 that detects that T1 has terminated (T2 may accomplish this by calling T1.isAlive() or T1.join())
If thread T1 interrupts thread T2, the interrupt by T1 synchronizes-with any point where any other thread (including T2) determines that T2 has been interrupted (by having an InterruptedException thrown or by invoking Thread.interrupted or Thread.isInterrupted).
You can choose any of these methods to establish happens-before. In practice, starting the reading threads after the sudoku has been fully constructed is probably the easiest way.
From my point of view, locking is necessary if you write and this writing takes a long time to complete due to say network latency or massive processing overhead.
Otherwise it's pretty safe to leave the locking out.
I am working on a graph ( nodes and vertices ) partitioning algorithm.
I use multiple threads to try and identify certain regions inside the graph.
Once a node has been identified as a part of a region, I set a boolean marked to true for the node object.
Multiple threads can attempt to mark the same node at the same time.
Currently I use synchronization to ensure nothing bad happens.
However since I never read the value of marked till after all the threads have finished processing. Would it be possible for me to get rid of the synchronization code? In other words can anything go wrong when concurrently writing to a boolean variable?
can anything go wrong when concurrently writing to a boolean variable?
Yes and no. Certainly the resulting value won't be somehow corrupted however it is going to be non-deterministic on which of the updates gets set on the field and when these updates are seen by other threads -- if at all.
If you have multiple threads making decisions using the value of this boolean, you have to at some point provide memory synchronization. Making the field volatile costs very little and unless you have evidence that it is a performance problem, not having the field be volatile is most likely premature optimization. If you are comparing and setting then an AtomicBoolean is recommended which wraps a volatile boolean and provides higher level methods like compareAndSet(...).
In theory, no, but I wouldn't mind declaring the variable volatile. Volatile keyword ensures atomic access.
(Provided that the order of the writes do not matter and all reads occur after all writes.)
No, nothing could go wrong when multiple threads write to the same boolean value, but there can be a problem of reading the value (even a long time) later in a different thread. You should at least mark the variables as volatile to prevent the problem.
As others have said there is no risk of corruption or incorrect values to the boolean if you are simply trying to set it to the same value from multiple threads.
However, you may not even need that
I never read the value of marked till after all the threads have finished processing.
You obviously need some sort of barrier to synchronize the coordinating thread with the worker threads (such as Thread.join() or CountdownLatch or your primitive du jour) and nearly all of those already provide a happens-before relationship that will make all your marks visible to the coordinator thread.
Having that single point of synchronization also just happens to be cheaper than reading a large number of volatiles (and I wouldn't call that premature optimization, simply eliding the need for volatiles)
No. If the order of the writes to that variable do not matter.
Consider these two situations:
a map which you are going to populate once at the beginning and then will be accessed from many different threads.
a map which you are going to use as cache that will be accessed from many different threads. you would like to avoid computing the result that will be stored in the map unless it is missing, the get-computation-store block will be synchronized. (and the map will not otherwise be used)
In either of these cases, does ConcurrentHashMap offer you anything additional in terms of thread safety above an ordinary HashMap?
In the first case, it should not matter in practice, but there is no guarantee that modifications written to a regular hashmap will ever be seen by other threads. So if one thread initially creates and populates the map, and that thread never synchronized with your other threads, then those threads may never see the initial values set into the map.
The above situation is unlikely in practice, and would only take a single synchronization event or happens before guarantee between the threads (read / write to a volatile variable for instance) to ensure even theoretical correctness.
In the second case, there is a concern since access to a HashMap that modifies it structurally (adding a value) requires synchronization. Furthermore, you need some type of synchronization to establish a happens-before relationship / shared visibility with the other threads or there is no guarantee that the other threads will see the new values you put in. ConcurrentHashMap offers these guarantees and will not break when one thread modifies it structurally.
There is no difference in thread safety, no. For scenario #2 there is a difference in performance and a small difference in timing guarantees.
There will be no synchronization for your scenario #2, so threads that want to use the cache don't have to queue up and wait for others to finish. However, in order to get that benefit you don't have hard happens-before relationships at the synchronization boundaries, so it's possible two threads will compute the same cached value more or less at the same time. This is generally harmless as long as the computation is repeatable.
(There is also the slight difference that ConcurrentHashMap does not allow null to be used as a key.)