Java multithread visibility?

Java multithread visibility? - java

i have read in a book saying that because of compiler optimization, code execution might be reordered to cause the ReaderThread be in infinite loop. How is that possible?
public class NoVisibility {
private static boolean ready;
private static int number;
private static class ReaderThread extends Thread {
public void run() {
while (!ready)
Thread.yield();
System.out.println(number);
}
}
public static void main(String[] args) {
new ReaderThread().start();
number = 42;
ready = true;
}
}

How is that possible?
Code reordering is possible (in general) because the Java Language Specification (JLS) says it is possible. However, reordering is (probably) not going to be the problem here. Rather, an infinite loop is likely to be due to hardware memory cache behavior.
In this case, there is nothing in the JLS that requires the writes to the variables made by the main method to be visible to the child thread. The technical explanation is that there is no happens-before chain linking the writes to the (subsequent) reads. Without the crucial happens-before chain, visibility is not guaranteed.
Note that, whether there is actually an infinite loop here will depend on all sorts of factors. The point is that it is a possibility given the way the example code is written.

it is caused by CPU cache maybe. The ReaderThread may see outdated value of the "ready" variable, so it may not break out the while loop. The outdated "ready" value is called a stale data.

Related

Using a boolean to coordinate two threads in Java

I understand the code section below is problematic because the new value of isAlive set in the kill method might not be visible in the thread.
public class MyClass extends Thread {
private boolean isAlive;
public void run() {
while(isAlive) {
....
}
}
public void kill() {
isAlive = false;
}
}
The typical fix is to declare the isAlive variable as volatile.
My question here is that is there any other ways to achieve this without using volatile? Does Java provide other mechanisms to achieve this?
EDIT: Synchronize the method is also not an option.

There is no good reason to go for a different option than volatile. Volatile is needed to provide the appropriate happens-before edge between writing and reading; otherwise you have a data-race on your hands and as a consequence the write to the flag might never be seen. E.g. the compiler could hoist the read of the variable out of a loop.
There are cheaper alternative that provide more relaxed ordering guarantees compared to the sequential consistency that volatile provides. E.g. acquire/release or opaque (check out the Atomic classes and the VarHandle). But this should only be used in very rare situations where the ordering constraints reduce performance due to limited compiler optimizations and fences on a hardware level.
Long story short: make the variable volatile because it a simple and very fast solution.

There are three options:
Make the shared variable volatile. (This is the simplest way.)
Use synchronized, either in the form of synchronized methods or synchronized blocks. Note that you need to do both reads and writes for the shared variables while holding the (same) mutex.
Use one of the classes in java.util.concurrent that has a "synchronizing effect"1. Or more precisely, one that you can use to get a happens before relationship between the update and subsequent read of the isAlive variable. This will be documented in the respective classes javadocs.
If you don't use one of those options, it is not guaranteed2 that the thread that calls run() will see isAlive variable change from true to false.
If you want to understand the deep technical reasons why this is so, read Chapter 17.4 of the Java Language Specification where it specifies the Java Memory Model. (It will explain what happens before means in this context.)
1 - One of the Lock classes would be an obvious choice.
2 - That is to say ... your code may not work 100% reliably on all platforms. This is the kind of problem where "try it and see" or even extensive testing cannot show conclusively that your code is correct.

The wait/notify mechanism is embedded deep in the heart of the java language, the superclass of classes, has five methods that are the core of the wait/notify mechanism, notify(), notifyAll(), wait(), wait(long), and wait(long, int), all classes in java inherit from Object, in addition, none of these methods can be overridden in a subclass as they are all declared final
here is an example that may help you to understand the concept
public class NotifyAndWait {
public List list;
public NofityAndWait() { list = Collections.synchronizedList(new LinkedList ());
public String removeItem() throws InterruptedException {
synchronized(list) {
while(list.isEmpty())
list.wait();
}
String item = list.remove(0);
return item;
}
public void addItem(String item) {
synchronized(list) {
list.add(item);
//after adding, nofity and waiting all thread that the list has changed
list.notifyAll();
}
}
public static void main(String..args) throws Exception {
final NotifyAndWait obj = new NotifyAndWait();
Runnable runA = new Runnable() {
public void run() {
try {
String item = enf.removeItem();
catch(Exception e) {} };
Runnable runB = new Runnable() {
public void run() { obj.addItem("Hello"); }
};
Thread t1 = new Thread(runA, "T1");
t1.start();
Thread.sleep(500);
Thread t2 = new Thread(runB, "T2");
t2.start();
Thread.sleep(1000);
}
}

As far as I know, polling a boolean in a while loop as a "kill" control is a perfectly reasonable thing to do. Brian Goetz, in "Java Concurrency in Action" has a code example that is very similar to yours, on page 137 (section 7.1 Task Cancellation).
He states that making the boolean volatile gives the pattern greater reliability. He has a good description of the mechanism on page 38.
When a field is declared volatile, the compile and runtime are put on
notice that this variable is shared and that operations on it should
not be reordered with other memory operations. Volatile variables are
not cached in registers or in caches where they are hidden from other
processors, so a read of a volatile variable always returns the most
recent write by any thread.
I use volatile booleans and loose coupling as my main method of communication that must occur across threads. AFAIK, volatile has a smaller cost than synchronized and is the most recommended pattern for this situation.

java instance variable not visible to other threads

I've encountered this code in a book. It states NoVisibility could loop forever because the value of ready might never become
visible to the reader thread.
I'm confused by this statement. In order for the loop to run forever, ready must always be false, which is the default value. This means it must fail at executing ready = true; because the reader thread will always read the ready variable from memory. the assignment happens in CPU and it must have some problem in flushing the data back to Main Memory. I think I need some explanation on a situation how it can fail, or I may have missed some other part.
public class NoVisibility {
private static boolean ready;
private static int number;
private static class ReaderThread extends Thread {
public void run() {
while (!ready)
Thread.yield();
System.out.println(number);
}
}
public static void main(String[] args) {
new ReaderThread().start();
number = 42;
ready = true;
}
}

Your understanding is flawed. You are assuming that Java will behave intuitively here. In fact, it may not. And, indeed, the Java Language specification allows non-intuitive behavior if you don't follow the rules.
To be more specific, in your example it is NOT GUARANTEED that the second thread will see the results of the first thread's assignment to ready1. This is due to such things as:
The compiler caching the value of ready in a register in the first or second thread.
The compiler not including instructions to force the write to be flushed from one core's memory cache to main memory, or similar.
If you want a guarantee that the second thread will see the result of the write then either reads and writes of ready by the two threads must be (properly) synchronized, or the ready variable must be declared to be volatile.
So ...
This means it must fail at executing ready = true; because the reader thread will always read the ready variable from memory.
is incorrect. The "because" is not guaranteed by the Java language specification in this example.
Yes. It is nonintuitive. Relying on your intuition based on your understanding of single-threaded programs is not reliable. If you want to want to understand what is and is not guaranteed, please study the specification of the "Java Memory Model" in Section 17.4 of the JLS.
In short, the book is correct.
1 - It might see the results immediately, or after a short or long delay. Or it might never see them. And the behavior is liable to vary from one system to the next, and with versions of the Java platform. So your program that (by luck) works all of the time on one system may not always work on another system.

The value of ready may be updated but the other thread may never know about it. There you need volatile variables! A thread assumes that the variable is only used by this and only thread. So, it reads its value from the stack that it created.
private static volatile boolean ready;
What volatile does is that it says to your program to ready from the memory, not from the stack.
Actually what jvm does is it translates:
while(flag){...}
To:
if(flag){
while(true){
}
The stack is created when the thread is created. It collectes the values of the variables in order to use them later.
This is what I have understand, correct me if I am wrong!

Why is this code working without volatile?

I am new to Java, I am currently learning about volatile. Say I have the following code:
public class Test
{
private static boolean b = false;
public static void main(String[] args) throws Exception
{
new Thread(new Runnable()
{
public void run()
{
while(true)
{
b = true;
}
}
}).start();
// Give time for thread to start
Thread.sleep(2000);
System.out.println(b);
}
}
Output:
true
This code has two threads (the main thread and another thread). Why is the other thread able to modify the value of b, shouldn't b be volatile in order for this to happen?

The volatile keyword guarantees that changes are visible amongst multiple threads, but you're interpreting that to mean that opposite is also true; that the absence of the volatile keyword guarantees isolation between threads, and there's no such guarantee.
Also, while your code example is multi-threaded, it isn't necessarily concurrent. It could be that the values were cached per-thread, but there was enough time for the JVM to propagate the change before you printed the result.

You are right that with volatile, you can ensure/guarantee that your 2 threads will see the appropriate value from main memory at all times, and never a thread-specific cached version of it.
Without volatile, you lose that guarantee. And each thread is working with its own cached version of the value.
However, there is nothing preventing the 2 threads from resynchronizing their memory if and when they feel like it, and eventually viewing the same value (maybe). It's just that you can't guarantee that it will happen, and you most certainly cannot guarantee when it will happen. But it can happen at some indeterminate point in time.
The point is that your code may work sometimes, and sometimes not. But even if every time you run it on your personal computer, is seems like it's reading the variable properly, it's very likely that this same code will break on a different machine. So you are taking big risks.

java: `volatile` private fields with getters and setters

Should we declare the private fields as volatile if the instanced are used in multiple threads?
In Effective Java, there is an example where the code doesn't work without volatile:
import java.util.concurrent.TimeUnit;
// Broken! - How long would you expect this program to run?
public class StopThread {
private static boolean stopRequested; // works, if volatile is here
public static void main(String[] args) throws InterruptedException {
Thread backgroundThread = new Thread(new Runnable() {
public void run() {
int i = 0;
while (!stopRequested)
i++;
}
});
backgroundThread.start();
TimeUnit.SECONDS.sleep(1);
stopRequested = true;
}
}
The explanations says that
while(!stopRequested)
i++;
is optimized to something like this:
if(!stopRequested)
while(true)
i++;
so further modifications of stopRequested aren't seen by the background thread, so it loops forever. (BTW, that code terminates without volatile on JRE7.)
Now consider this class:
public class Bean {
private boolean field = true;
public boolean getField() {
return field;
}
public void setField(boolean value) {
field = value;
}
}
and a thread as follows:
public class Worker implements Runnable {
private Bean b;
public Worker(Bean b) {
this.b = b;
}
#Override
public void run() {
while(b.getField()) {
System.err.println("Waiting...");
try { Thread.sleep(1000); }
catch(InterruptedException ie) { return; }
}
}
}
The above code works as expected without using volatiles:
public class VolatileTest {
public static void main(String [] args) throws Exception {
Bean b = new Bean();
Thread t = new Thread(new Worker(b));
t.start();
Thread.sleep(3000);
b.setField(false); // stops the child thread
System.err.println("Waiting the child thread to quit");
t.join();
// if the code gets, here the child thread is stopped
// and it really gets, with JRE7, 6 with -server, -client
}
}
I think because of the public setter, the compiler/JVM should never optimize the code which calls getField(), but this article says that there is some "Volatile Bean" pattern (Pattern #4), which should be applied to create mutable thread-safe classes. Update: maybe that article applies for IBM JVM only?
The question is: which part of JLS explicitly or implicitly says that private primitive fields with public getters/setters must be declared as volatile (or they don't have to)?
Sorry for a long question, I tried to explain the problem in details. Let me know if something is not clear. Thanks.

The question is: which part of JLS explicitly or implicitly says that private primitive fields with public getters/setters must be declared as volatile (or they don't have to)?
The JLS memory model doesn't care about getters/setters. They're no-ops from the memory model perspective - you could as well be accessing public fields. Wrapping the boolean behind a method call doesn't affect its memory visibility. Your latter example works purely by luck.
Should we declare the private fields as volatile if the instanced are used in multiple threads?
If a class (bean) is to be used in multithreaded environment, you must somehow take that into account. Making private fields volatile is one approach: it ensures that each thread is guaranteed to see the latest value of that field, not anything cached / optimized away stale values. But it doesn't solve the problem of atomicity.
The article you linked to applies to any JVM that adheres to the JVM specification (which the JLS leans on). You will get various results depending on the JVM vendor, version, flags, computer and OS, the number of times you run the program (HotSpot optimizations often kick in after the 10000th run) etc, so you really must understand the spec and carefully adhere to the rules in order to create reliable programs. Experimenting in this case is a poor way to find out how things work because the JVM can behave in any way it wants as long at it falls within the spec, and most JVMs do contain loads of all kind of dynamic optimizations.

Before I answer your question I want to address
BTW, that code terminates without volatile on JRE7
This can change if you were to deploy the same application with different runtime arguments. Hoisting isn't necessarily a default implementation for JVMs so it can work in one and not in another.
To answer your question there is nothing preventing the Java compiler from executing your latter example like so
#Override
public void run() {
if(b.getField()){
while(true) {
System.err.println("Waiting...");
try { Thread.sleep(1000); }
catch(InterruptedException ie) { return; }
}
}
}
It is still sequentially consistent and thus maintains Java's guarantees - you can read specifically 17.4.3:
Among all the inter-thread actions performed by each thread t, the
program order of t is a total order that reflects the order in which
these actions would be performed according to the intra-thread
semantics of t.
A set of actions is sequentially consistent if all actions occur in a
total order (the execution order) that is consistent with program
order, and furthermore, each read r of a variable v sees the value
written by the write w to v such that:
In other words - So long as a thread will see the read and write of a field in the same order regardless of the compiler/memory re ordering it is considered sequentially consistent.

No, that code is just as incorrect. Nothing in the JLS says a field must be declared as volatile. However, if you want your code to work correctly in a multi-threaded environment, then you have to obey the visibility rules. volatile and synchronized are two of the major facilities for correctly making data visible across threads.
As for your example, the difficulty of writing multi-threaded code is that many forms of incorrect code work fine in testing. Just because a multi-threaded test "succeeds" in testing does not mean it is correct code.
For the specific JLS reference, see the Happens Before section (and the rest of the page).
Note, as a general rule of thumb, if you think you have come up with a clever new way to get around "standard" thread-safe idioms, you are most likely wrong.

Questions on Concurrency from Java Guide

So I've been reading on concurrency and have some questions on the way (guide I followed - though I'm not sure if its the best source):
Processes vs. Threads: Is the difference basically that a process is the program as a whole while a thread can be a (small) part of a program?
I am not exactly sure why there is a interrupted() method and a InterruptedException. Why should the interrupted() method even be used? It just seems to me that Java just adds an extra layer of indirection.
For synchronization (and specifically about the one in that link), how does adding the synchronize keyword even fix the problem? I mean, if Thread A gives back its incremented c and Thread B gives back the decremented c and store it to some other variable, I am not exactly sure how the problem is solved. I mean this may be answering my own question, but is it supposed to be assumed that after one of the threads return an answer, terminate? And if that is the case, why would adding synchronize make a difference?
I read (from some random PDF) that if you have two Threads start() subsequently, you cannot guarantee that the first thread will occur before the second thread. How would you guarantee it, though?
In synchronization statements, I am not completely sure whats the point of adding synchronized within the method. What is wrong with leaving it out? Is it because one expects both to mutate separately, but to be obtained together? Why not just have the two non-synchronized?
Is volatile just a keyword for variables and is synonymous with synchronized?
In the deadlock problem, how does synchronize even help the situation? What makes this situation different from starting two threads that change a variable?
Moreover, where is the "wait"/lock for the other person to bowBack? I would have thought that bow() was blocked, not bowBack().
I'll stop here because I think if I went any further without these questions answered, I will not be able to understand the later lessons.

Answers:
Yes, a process is an operating system process that has an address space, a thread is a unit of execution, and there can be multiple units of execution in a process.
The interrupt() method and InterruptedException are generally used to wake up threads that are waiting to either have them do something or terminate.
Synchronizing is a form of mutual exclusion or locking, something very standard and required in computer programming. Google these terms and read up on that and you will have your answer.
True, this cannot be guaranteed, you would have to have some mechanism, involving synchronization that the threads used to make sure they ran in the desired order. This would be specific to the code in the threads.
See answer to #3
Volatile is a way to make sure that a particular variable can be properly shared between different threads. It is necessary on multi-processor machines (which almost everyone has these days) to make sure the value of the variable is consistent between the processors. It is effectively a way to synchronize a single value.
Read about deadlocking in more general terms to understand this. Once you first understand mutual exclusion and locking you will be able to understand how deadlocks can happen.
I have not read the materials that you read, so I don't understand this one. Sorry.

I find that the examples used to explain synchronization and volatility are contrived and difficult to understand the purpose of. Here are my preferred examples:
Synchronized:
private Value value;
public void setValue(Value v) {
value = v;
}
public void doSomething() {
if(value != null) {
doFirstThing();
int val = value.getInt(); // Will throw NullPointerException if another
// thread calls setValue(null);
doSecondThing(val);
}
}
The above code is perfectly correct if run in a single-threaded environment. However with even 2 threads there is the possibility that value will be changed in between the check and when it is used. This is because the method doSomething() is not atomic.
To address this, use synchronization:
private Value value;
private Object lock = new Object();
public void setValue(Value v) {
synchronized(lock) {
value = v;
}
}
public void doSomething() {
synchronized(lock) { // Prevents setValue being called by another thread.
if(value != null) {
doFirstThing();
int val = value.getInt(); // Cannot throw NullPointerException.
doSecondThing(val);
}
}
}
Volatile:
private boolean running = true;
// Called by Thread 1.
public void run() {
while(running) {
doSomething();
}
}
// Called by Thread 2.
public void stop() {
running = false;
}
To explain this requires knowledge of the Java Memory Model. It is worth reading about in depth, but the short version for this example is that Threads have their own copies of variables which are only sync'd to main memory on a synchronized block and when a volatile variable is reached. The Java compiler (specifically the JIT) is allowed to optimise the code into this:
public void run() {
while(true) { // Will never end
doSomething();
}
}
To prevent this optimisation you can set a variable to be volatile, which forces the thread to access main memory every time it reads the variable. Note that this is unnecessary if you are using synchronized statements as both keywords cause a sync to main memory.
I haven't addressed your questions directly as Francis did so. I hope these examples can give you an idea of the concepts in a better way than the examples you saw in the Oracle tutorial.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.