I am working on a graph ( nodes and vertices ) partitioning algorithm.
I use multiple threads to try and identify certain regions inside the graph.
Once a node has been identified as a part of a region, I set a boolean marked to true for the node object.
Multiple threads can attempt to mark the same node at the same time.
Currently I use synchronization to ensure nothing bad happens.
However since I never read the value of marked till after all the threads have finished processing. Would it be possible for me to get rid of the synchronization code? In other words can anything go wrong when concurrently writing to a boolean variable?
can anything go wrong when concurrently writing to a boolean variable?
Yes and no. Certainly the resulting value won't be somehow corrupted however it is going to be non-deterministic on which of the updates gets set on the field and when these updates are seen by other threads -- if at all.
If you have multiple threads making decisions using the value of this boolean, you have to at some point provide memory synchronization. Making the field volatile costs very little and unless you have evidence that it is a performance problem, not having the field be volatile is most likely premature optimization. If you are comparing and setting then an AtomicBoolean is recommended which wraps a volatile boolean and provides higher level methods like compareAndSet(...).
In theory, no, but I wouldn't mind declaring the variable volatile. Volatile keyword ensures atomic access.
(Provided that the order of the writes do not matter and all reads occur after all writes.)
No, nothing could go wrong when multiple threads write to the same boolean value, but there can be a problem of reading the value (even a long time) later in a different thread. You should at least mark the variables as volatile to prevent the problem.
As others have said there is no risk of corruption or incorrect values to the boolean if you are simply trying to set it to the same value from multiple threads.
However, you may not even need that
I never read the value of marked till after all the threads have finished processing.
You obviously need some sort of barrier to synchronize the coordinating thread with the worker threads (such as Thread.join() or CountdownLatch or your primitive du jour) and nearly all of those already provide a happens-before relationship that will make all your marks visible to the coordinator thread.
Having that single point of synchronization also just happens to be cheaper than reading a large number of volatiles (and I wouldn't call that premature optimization, simply eliding the need for volatiles)
No. If the order of the writes to that variable do not matter.
Related
I declared a instance variable as voltile. Say two threads are created by two processors under multi core where thread updates the variable. To ensure
instantaneous visibilty, I believe declaring variable as volatile is right choice here so that update done by thread happens in main memory and is visible to another thread .
Right?
Intention here to understand the concept in terms of multicore processor.
I am assuming you are considering using volatile vs. not using any special provisions for concurrency (such as synchronized or AtomicReference).
It is irrelevant whether you are running single-code or multicore: sharing data between threads is never safe without volatile. There are many more things the runtime is allowed to do without it; basically it can pretend the accessing thread is the only thread running on the JVM. The thread can read the value once and store it on the call stack forever; a loop reading the value, but never writing it, may be transformed such that the value is read only once at the outset and never reconsidered, and so on.
So the message is simple: use volatile—but that's not necessarily all you need to take care of in concurrent code.
It doesn't matter if it's done by different processors or not. When you don't have mult-processors, you can still run into concurrency problems because context switches may happen any time.
If a field is not volatile, it may still be in one thread's cache while its context is switched out and the other thread's context switches in. In that case, the thread that just took over the (single) processor will not see that the field has changed.
Since these things can happen even with one processor, they are bound to happen with more than one processor, so indeed, you need to protect your shared data.
Whether volatile is the right choice or not depends on what type it is and what kind of change you are trying to protect from. But again, that has nothing to do with the number of processors.
If the field is a reference type, then volatile only ensures the vilibility of new assignments to the field. It doesn't protect against changes in the object it points to - for that you need to synchronize.
While going through Java Concurrency in practice by Brian Goetz I encountered the following line:
A data race occurs when a variable is read by more than one thread,
and written by at least one thread, but the reads and writes are not
ordered by happens-before. A correctly synchronized program is one
with no data races; correctly synchronized programs exhibit sequential
consistency, meaning that all actions within the program appear to
happen in a fixed, global order.
My Question is that, Is Out of Order writes the only reason for Data Race condition in java or possibly in other programming languages?
UPDATE
OK, I did some more investigation about data-race and found the following from oracle official site which says that :
The Thread Analyzer detects data-races that occur during the execution
of a multi-threaded process. A data race occurs when:
two or more threads in a single process access the same memory location concurrently, and
at least one of the accesses is for writing, and
the threads are not using any exclusive locks to control their accesses to that memory.
When these three conditions hold, the order of accesses is
non-deterministic, and the computation may give different results from
run to run depending on that order. Some data-races may be benign (for
example, when the memory access is used for a busy-wait), but many
data-races are bugs in the program.
In this part, it is mentioning that : the order of accesses is non-deterministic
Is it talking about the the sequence in which Threads are accessing the memory location? If yes, then synchronization never guarantee about the order in which threads will access the block of code. So , how synchronization can resolve the issue of data race?
I would rather define data race as
Data race between writing and reading of some value or reference from a variable is a situation when the result of reading is determined by the "internal" (jvm- or os-controlled) thread scheduling.
In fact, second definition from the question says the same in more "official" words :)
In the other words, consider thread A writing some value to the variable and thread B attempting to read it. If you miss any kind of synchronization (or other mechanism that can provide happens-before guarantees between write and subsequent read), your program has a data race between threads A and B.
Now, to your question:
Is it talking about the the sequence in which Threads are accessing the memory location? If yes, then synchronization never guarantee about the order in which threads will access the block of code.
Synchronization in that particular case guarantees that you will never be able to read value that variable had before the writer thread written new value after writer thread exited synchronized block or method. Without syncronization, there is a chance to read old value even after write is actually happened.
About the order of access: it is going to be deterministic with synchronization in the following way:
Let's take a look at our threads A and B again. The operations order is now sequential - thread B will not be able to start reading until thread A finished with writing. To get this situation clear, imagine that writing and reading is really a long process. Without synchronization, these operations will be able to interlap with each other which might result in some meaningless values read.
I have been reading the book called art of multiprocessor programming and came across functions such as get(), getandset(), compareandset(), getandIncrease(), getandIncrease() etc.
In the book it says that all the above function are atomic and I agree but I had my own doubts on how some function becomes a atomic function.
Why does the function with get or compare become atomic ? - because it has to wait till it gets the value or waits till some condition becomes true which creates a barrier, hence atomic.
Am I right in thinking this way? is there any thing that I have missed ?
When I do
if (tail_index.get() == (head_index.getAndIncrement())
is this atomic?
A method is made atomic relative to some instance by adding explicit thread-safety. In many cases this is done by marking the method as synchronized. There is not magic, if you look at the source code of the thread-safe class that claims that methods are atomic, you will see the locking.
WRT to your second part, No it is not atomic. Each method call is atomic but when you put two together the combination is not atomic. get and getAndIncrement have been explicitly made atomic. Once you add other code (or a combination of the calls) it is not atomic unless you make it so.
A function is atomic if it appears to occur instantaneously.[1]
Here, "appears to" means from the point of view of the rest of the system. For instance, consider a synchronized function that reverses a linked list. To an outside observer, the operation clearly does not occur instantaneously: it takes many reads and writes to update all the list pointers. However, as a lock is held the entire time, no other part of the system may read the list during this time, so to them, the update appears instantaneous.
Equally, CAS (compare-and-set) operations do not actually occur instantly on modern computers. It takes time for one CPU core to obtain exclusive write access to the value, and then it takes more time for another core to re-obtain read access afterwards to see the new value. During this time, the CPU is executing other instructions in parallel. To ensure the illusion of instantaneous execution is preserved, the JVM issues CPU instructions before and after the CAS operation to ensure no logically subsequent reads get pulled up and executed before the CAS finishes (which would allow you to read a part of the linked list before you had actually taken the lock, for instance), and that no logically preceding writes get delayed and executed after the CAS finishes (which would allow another thread to take the lock before the linked list was completely updated).
These CPU ordering instructions are the key difference between AtomicInteger.compareAndSet and AtomicInteger.weakCompareAndSet (the "may fail spuriously" bit is easily rectified with a loop). Without the ordering guarantees, the weak CAS operation cannot be used to implement most concurrent algorithms, and "is only rarely an appropriate alternative to compareAndSet".
If this is sounding complicated...well...it is! Which is why you can still get a PhD by designing a concurrent algorithm. To show correctness for a concurrent algorithm, you have to consider what every other thread may possibly be doing to mess you around. It may help if you think of them as adversaries, trying to break the illusion of atomicity. For instance, let's consider your example:
if (tail_index.get() == (head_index.getAndIncrement()))
I assume this is part of a method to pop an item off a stack implemented as a cyclic array with index counters, and execute the body of the "if" if the stack is now empty. As head_index and tail_index are being accessed separately, your adversary can "divide" them with as many operations as he likes. (Imagine, for instance, that your thread is interrupted by the OS between the get and the getAndIncrement.) So it would be easy for him to add dozens of items to the stack, then remove all but one, leaving head_index above tail_index; your if block will then never execute, even though you are removing the last item on the stack.
So, when your book says get(), getAndSet(), etc. are atomic, it is not making a general statement about any possible implementation of those methods. It's telling you that the Java standard guarantees that they are atomic, and does so by careful use of the available CPU instructions, in a way that would be impossible to do in plain Java (synchronized lets you emulate it, but is more costly).
No, function, using get() is not atomic. But, for example, getAndIncrement or compareAndSet are atomic themselves. That means that it guaranteed, that all the logic is made atomically. For get() there is one another assurance: when you publish atomic value into one thread, it immediately becomes visible to another threads (just like volatile fields). Non-volatile and non-atomic values dont: there are cases, where value being set to non-volatile fiels is not visible to another threads; these threads get an old value reading field's value.
But you always can write atomic function using Atomic* classes and other synchonization primitives.
From docs:
Using volatile variables reduces the risk of memory consistency errors
But this means that sometimes volatile variables don't work correct?
Strange how it can be used - for my opinion it is very bad code that sometimes work sometimes not. I tried to Google but didn't find example memory consistency error with volatile. Could you please propose one?
The issue is not so much that volatile works unreliably. It always works the way it is supposed to work. The problem is that the way it is supposed to work is sometimes not adequate for concurrency control. If you use volatile in the wrong situation, you can still get memory consistency errors.
A volatile variable will always have any writes propagated to all threads. However, suppose you need to increment the variable among various threads. Doing this(*):
volatile int mCounter;
// later, in some code that might be executed simultaneously on multiple threads:
mCounter++;
There is a chance that counter increments will be missed. This is because the value of mCounter needs to be first read by each thread before a new value can be written. In between those two steps, another thread may have changed the value of mCounter. In situations like this, you would need to rely on synchronized blocks rather than volatile to ensure data integrity.
For more info on volatile vs. synchronized, I recommend the article Managing volatility by Brian Goetz
(*) I realize that the above would be better implemented with AtomicInteger; it's a contrived example to illustrate a point.
Volatile does the following:
- It prevents the caching the values in the Thread.
- It makes sure that the threads having the copies of the values of the fields of the object reconcile with the main copy present in the memory.
- Making sure the data is written directly to the memory and read from memory itself.
## But the condition where volatile fails:
- Making a Non-Atomic statement Volatile.
Eg:
int count = 0;
count++; // Increment operator is Not Atomic in java
## Better option:
1. Its always better to follow the Brian's Rule:
When ever we write a variable which is next to be read by another thread, or when we are reading a variable which is written just by another thread, it needs to be synchronized.
The shared fields must be made private, making the read and write methods/atomic statements synchronized.
2. Second option is using the Atomic Classes, like AtomicInteger, AtomicLong, AtomicReference, etc.
## See this link, i have asked a similar question like yours:
Why Volatile is behaving weirdly
Reading a few threads (common concurrency problems, volatile keyword, memory model) I'm confused about concurrency issues in Java.
I have a lot of fields that are accessed by more than one thread. Should I go through them and mark them all as volatile?
When building a class I'm not aware of whether multiple threads will access it, so surely it is unsafe to let any field not be volatile, so by my understanding there's very few cases you wouldn't use it. Is this correct?
For me this is specific to version 1.5 JVMs and later, but don't feel limited to answering about my specific setup.
Well, you've read those other questions and I presume you've read the answers already, so I'll just highlight some key points:
are they going to change? if not, you don't need volatile
if yes, then is the value of a field related to another? if yes, go to point 4
how many threads will change it? if only 1, then volatile is all you need
if the answer to number 2 is "no" or more than one threads is going to write to it, then volatile alone is not enough, you'll probably need to synchronize the access
Added:
If the field reference an Object, then it will have fields of its own and all those consideration also applies to these fields.
If a field is accessed by multiple threads, it should be volatile or final, or accessed only with synchronized blocks. Otherwise, assigned values may not be visible to other threads.
A class has to be specifically designed for concurrent access by multiple threads. Simply marking fields volatile or final is not sufficient for thread-safety. There are consistency issues (atomicity of changes to multiple fields), concerns about inter-thread signaling (for example using wait and notify), etc.
So, it is safest to assume that an object should be visible to only a single thread unless it is documented otherwise. Making all of your objects thread-safe isn't necessary, and is costly—in terms of software speed, but more importantly, in terms of development expense.
Instead, software should be designed so that concurrent threads interact with each other as little as possible, preferably not at all. The points where they do interact need to be clearly identified so that the proper concurrency controls can be designed.
If you have to ask, use locks. volatile can be useful in some cases, but it's very, very difficult to get right. For example:
class Foo {
private volatile int counter = 0;
int Increment() {
counter++;
return counter;
}
}
If two threads run Increment() at the same time, it's possible for the result to be counter = 1. This is because the computer will first retrieve counter, add one, then save it back. Volatile just forces the save and load to occur in a specific order relative to other statements.
Note that synchronized usually obviates the need for volatile - if all accesses to a given field are protected by the same monitor, volatile will never be needed.
Using volatile to make lockless algorithms is very, very difficult; stick to synchronized unless you have hard evidence that it's too slow already, and have done detailed analysis on the algorithm you plan to implement.
The short answer is no. Threading issues require more thought and planning than this. See this for some limitations on when volatile helps for threading and when it does not. The modification of the values has to be properly synchronized, but very typically modification requires the state of more than one variable at a time. Say for example you have variable and you want to change it if it meets a criteria. The read from the array and the write to the array are different instructions, and need to be synchronized together. Volatile is not enough.
Consider also the case where the variable references a mutable object (say an array or a Collection), then interacting with that object will not be thread safe just because the reference is volatile.