Consider a situation like this.
There are two threads and a shared resource(like a HashMap). One thread created the HashMap and initialized it with some key-value pairs and after the shared resource is initialized it will never be modified again.
Now, the second thread is created strictly after the shared resource is initialized and wants to use that resource. At this point I would like some guarantee that the second thread will use the correct version of the shared resource. I presume it is possible that the first thread didn't flush the changes to the main memory before the second thread is created so the second thread will take the old value of the shared resource to it's cache.
Is this analysis correct, and how to force flush to main memory in Java by hand after initializing the shared resource as in this particular situation where I do not want or require volatile or synchronized.
The documentation says:
A call to start on a thread happens-before any action in the started thread.
So, if your code matches your description, it's safe.
If you declare and initialize your HashMap as static field it will be initialized by Java class loader in a thread safe fashion.
If map initialisation happens before start of the second thread then everything is correct. To simplify analysis and to make things simple you can convert ininitialized map into some immutable map implementation and pass it to the created thread explicitly. And this way you would not need to use a shared variable at all.
Is this analysis correct, and how to force flush to main memory in Java by hand after initializing the shared resource as in this particular situation where I do not want or require volatile or synchronized.
It's not possible to not require volatile or synchronized. You have to use some form memory synchronization between threads or stuff doesn't work.
You could use a static initializer as Andrei mentioned (*), or final, both of which imply a memory barrier. But you have to use something.
You may need to require a synchronized map (Collections.synchronizedMap()) or a CurrentHashMap, but you still need to use volatile, synchronized, final or static to guard the field itself.
C.f. Java Concurrency in Practice by Brian Geotz, and also this related question on Stack Overflow (note that the OP gets the name of the book wrong).
(* The whole static initializer thing is kinda complicated, and you should read Mr. Goetz's book, but I'll try to describe it briefly: static fields are part of class initialization. Each static field or static initializer block is written, or executed, by a thread (which could be the thread that called new or accessed the class object for the first time, or could be a different thread). When the process of writing all static fields for the first time is done, the JVM inserts a memory barrier so that the class object, with all its static fields, is visible to all threads in the system as required by the spec.
You do NOT get a memory barrier per field write, like volatile. The class load tries to be efficient and only inserts one barrier at the very end of initialization. Thus you should only use static initializers for what they're supposed to be for: filling in fields for the first time, and don't try to write entire programs inside a static initializer block. It's not efficient and your options for thread safety are actually more limited.
However, the memory barrier that's part of class initialization is available to use, and that's why, Andrei Amarfii said, the pattern of using a static initialzer in Java is used to ensure visibility of objects. It's important enough that Brian Goetz calls it out as one of his four "Safe Publication" patterns.)
Related
If a lock/semaphore is being used by multiple threads for synchronization, does one need to declare it as volatile? Are the member variables of locks/semaphores itself guaranteed to reflect the latest state for all threads (similar to volatile variables)?
Typically you want to make the lock final because once the lock instance is set, you don't want to change it.
But for discussion's sake, if you don't want to make it final, it is in most cases fine if the lock is NOT volatile. What typically happens is that one thread creates some data-structure holding the lock and using safe publication, the object is shared with other threads. Due to this safe-publication, the other threads will see a properly initialized lock.
Safe-publication can be done in various ways like passing the object to the constructor of a thread, using a synchronized block, or using a volatile.
I think this question fundamentally misses the point of how locks, semaphores, etc work.
The specific question asks whether using "volatile" is necessary.
Maybe take a read through the Java Tutorial section Atomic Access –
there's some discussion about "volatile" and how to use it, with a primary point being that it's about things changing.
Here's a statement to consider using a Semaphore – Semaphore s = new Semaphore(1) – along with a few points:
an object is created: new Semaphore(1)
a reference is declared: Semaphore s
the reference is associated with the object: s points to new object
To make use of the "volatile" keyword, you would
designate the reference – s – to be volatile (not the object) like this:
volatile Semaphore s = new Semaphore(1);
Why do that? That could be useful if you later allocated a new Semaphore altogether, like: s = new Semaphore(5), and you had concurrency concerns about
all threads seeing the newly allocated Semaphore(5) (instead of Semaphore(1) which was allocated earlier).
But why do that? Why change a shared object at runtime?
The scenarios in which such a thing would happen seem completely unrealistic. The scenario would be: there are threads with a
shared synchronization mechanism (such as a Semaphore), and they're actively running,
actively using the semaphore to coordinate execution...
then somehow, for some reason, the underlying coordination component (the semaphore) changes mid-runtime (?).
I cannot imagine any scenarios where
such a circumstance would arise, other than just flat out doing synchronization incorrectly.
All of this to say: no, do not use "volatile". Instead, use final.
I was asked by interviewer is there any danger not to use volatile if we know for sure that threads will never interfere.
eg. we have:
int i = 10;
// Thread 1
i++;
// await some time and switch to Thread 2
getI();
I don't use any synchronization.
Do we have any danger to receive outdated value of i by second thread?
Without volatile or synchronized or read/write barrier, there is no guarantee that another thread will ever see a change you made, no matter how long your wait. In particular boolean fields can be inlined into the code and no read actually performed. In theory, int values could be inlined if the JVM detects the field is not changed by the thread (I don't believe it actually does though)
we know for sure that threads will never interfere.
This is not something you can know, unless the reading thread is not running when you perform the update. When thread starts it will see any changes which occurred before it started.
You might receive outdated values, yes.
The reason in short is:
Every thread in Java has its own little 'cache'. For performance reasons, threads keep a copy of the 'master' data in their own memory. So you basically have a main memory and a threadlocal one. The volatile keyword forces the thread to access the main memory and not its local one.
Refer to this for more information: http://www.ibm.com/developerworks/library/j-5things15/
If an interviewer asked me that question, I would answer in terms of the Java Language Specification. There is no "cache" or "main memory" in the JLS. The JLS talks about fields (a.k.a., instance variables and class variables) and it makes very specific guarantees about when an update to a field that happens in thread A will become visible in thread B. Implementation details like "cache" and "memory barriers" and can vary from one platform to another, but a program that is correct with respect to the JLS should (in theory) be correct on any Java platform.
What is the use of ThreadLocal when a Thread normally works on variable keeping it in its local cache ?
Which means thread1 do not know the value of same var in thread2 even if no ThreadLocal is used .
With multiple threads, although you have to do work to make sure you read the "most recent" value of a variable, you expect there to be effectively one variable per instance (assuming we're talking about instance fields here). You might read an out of date value unless you're careful, but basically you've got one variable.
With ThreadLocal, you're explicitly wanting to have one value per thread that reads the variable. That's typically for the sake of context. For example, a web server with some authentication layer might set a thread-local variable early in request handling so that any code within the execution of that request can access the authentication details, without needing any explicit reference to a context object. So long as all the handling is done on the one thread, and that's the only thing that thread does, you're fine.
A thread doesn't have to keep variables in its local cache -- it's just that it's allowed to, unless you tell it otherwise.
So:
If you want to force a thread to share its state with other threads, you have to use synchronization of some sort (including synchronized blocks, volatile variables, etc).
If you want to prevent a thread from sharing its state with other threads, you have to use ThreadLocal (assuming the object that holds the variable is known to multiple threads -- if it's not, then everything is thread-local anyway!).
It's kind of a global variable for the thread itself, so that any code running in the thread can access it directly. (A "really" global variable can be accessed by any code running in the "process"; we could call it ProcessLocal:)
Is global variable bad? Maybe; it should be avoided if we can. But sometimes we have no choice, we cannot pass the object through method parameters, and ThreadLocal proves to be useful in many designs without causing too much trouble.
Use of ThreadLocal is when an object is not thread-safe, but you want to avoid synchronizing access. So each thread stores data on its own Thread local storage memory. By default, data is shared between threads.
I declared a instance variable as voltile. Say two threads are created by two processors under multi core where thread updates the variable. To ensure
instantaneous visibilty, I believe declaring variable as volatile is right choice here so that update done by thread happens in main memory and is visible to another thread .
Right?
Intention here to understand the concept in terms of multicore processor.
I am assuming you are considering using volatile vs. not using any special provisions for concurrency (such as synchronized or AtomicReference).
It is irrelevant whether you are running single-code or multicore: sharing data between threads is never safe without volatile. There are many more things the runtime is allowed to do without it; basically it can pretend the accessing thread is the only thread running on the JVM. The thread can read the value once and store it on the call stack forever; a loop reading the value, but never writing it, may be transformed such that the value is read only once at the outset and never reconsidered, and so on.
So the message is simple: use volatile—but that's not necessarily all you need to take care of in concurrent code.
It doesn't matter if it's done by different processors or not. When you don't have mult-processors, you can still run into concurrency problems because context switches may happen any time.
If a field is not volatile, it may still be in one thread's cache while its context is switched out and the other thread's context switches in. In that case, the thread that just took over the (single) processor will not see that the field has changed.
Since these things can happen even with one processor, they are bound to happen with more than one processor, so indeed, you need to protect your shared data.
Whether volatile is the right choice or not depends on what type it is and what kind of change you are trying to protect from. But again, that has nothing to do with the number of processors.
If the field is a reference type, then volatile only ensures the vilibility of new assignments to the field. It doesn't protect against changes in the object it points to - for that you need to synchronize.
My teacher in an upper level Java class on threading said something that I wasn't sure of.
He stated that the following code would not necessarily update the ready variable. According to him, the two threads don't necessarily share the static variable, specifically in the case when each thread (main thread versus ReaderThread) is running on its own processor and therefore doesn't share the same registers/cache/etc and one CPU won't update the other.
Essentially, he said it is possible that ready is updated in the main thread, but NOT in the ReaderThread, so that ReaderThread will loop infinitely.
He also claimed it was possible for the program to print 0 or 42. I understand how 42 could be printed, but not 0. He mentioned this would be the case when the number variable is set to the default value.
I thought perhaps it is not guaranteed that the static variable is updated between the threads, but this strikes me as very odd for Java. Does making ready volatile correct this problem?
He showed this code:
public class NoVisibility {
private static boolean ready;
private static int number;
private static class ReaderThread extends Thread {
public void run() {
while (!ready) Thread.yield();
System.out.println(number);
}
}
public static void main(String[] args) {
new ReaderThread().start();
number = 42;
ready = true;
}
}
There isn't anything special about static variables when it comes to visibility. If they are accessible any thread can get at them, so you're more likely to see concurrency problems because they're more exposed.
There is a visibility issue imposed by the JVM's memory model. Here's an article talking about the memory model and how writes become visible to threads. You can't count on changes one thread makes becoming visible to other threads in a timely manner (actually the JVM has no obligation to make those changes visible to you at all, in any time frame), unless you establish a happens-before relationship.
Here's a quote from that link (supplied in the comment by Jed Wesley-Smith):
Chapter 17 of the Java Language Specification defines the happens-before relation on memory operations such as reads and writes of shared variables. The results of a write by one thread are guaranteed to be visible to a read by another thread only if the write operation happens-before the read operation. The synchronized and volatile constructs, as well as the Thread.start() and Thread.join() methods, can form happens-before relationships. In particular:
Each action in a thread happens-before every action in that thread that comes later in the program's order.
An unlock (synchronized block or method exit) of a monitor happens-before every subsequent lock (synchronized block or method entry) of that same monitor. And because the happens-before relation is transitive, all actions of a thread prior to unlocking happen-before all actions subsequent to any thread locking that monitor.
A write to a volatile field happens-before every subsequent read of that same field. Writes and reads of volatile fields have similar memory consistency effects as entering and exiting monitors, but do not entail mutual exclusion locking.
A call to start on a thread happens-before any action in the started thread.
All actions in a thread happen-before any other thread successfully returns from a join on that thread.
He was talking about visibility and not to be taken too literally.
Static variables are indeed shared between threads, but the changes made in one thread may not be visible to another thread immediately, making it seem like there are two copies of the variable.
This article presents a view that is consistent with how he presented the info:
http://jeremymanson.blogspot.com/2008/11/what-volatile-means-in-java.html
First, you have to understand a little something about the Java memory model. I've struggled a bit over the years to explain it briefly and well. As of today, the best way I can think of to describe it is if you imagine it this way:
Each thread in Java takes place in a separate memory space (this is clearly untrue, so bear with me on this one).
You need to use special mechanisms to guarantee that communication happens between these threads, as you would on a message passing system.
Memory writes that happen in one thread can "leak through" and be seen by another thread, but this is by no means guaranteed. Without explicit communication, you can't guarantee which writes get seen by other threads, or even the order in which they get seen.
...
But again, this is simply a mental model to think about threading and volatile, not literally how the JVM works.
Basically it's true, but actually the problem is more complex. Visibility of shared data can be affected not only by CPU caches, but also by out-of-order execution of instructions.
Therefore Java defines a Memory Model, that states under which circumstances threads can see consistent state of the shared data.
In your particular case, adding volatile guarantees visibility.
They are "shared" of course in the sense that they both refer to the same variable, but they don't necessarily see each other's updates. This is true for any variable, not just static.
And in theory, writes made by another thread can appear to be in a different order, unless the variables are declared volatile or the writes are explicitly synchronized.
Within a single classloader, static fields are always shared. To explicitly scope data to threads, you'd want to use a facility like ThreadLocal.
When you initialize static primitive type variable java default assigns a value for static variables
public static int i ;
when you define the variable like this the default value of i = 0;
thats why there is a possibility to get you 0.
then the main thread updates the value of boolean ready to true. since ready is a static variable , main thread and the other thread reference to the same memory address so the ready variable change. so the secondary thread get out from while loop and print value.
when printing the value initialized value of number is 0. if the thread process has passed while loop before main thread update number variable. then there is a possibility to print 0
#dontocsata
you can go back to your teacher and school him a little :)
few notes from the real world and regardless what you see or be told.
Please NOTE, the words below are regarding this particular case in the exact order shown.
The following 2 variable will reside on the same cache line under virtually any know architecture.
private static boolean ready;
private static int number;
Thread.exit (main thread) is guaranteed to exit and exit is guaranteed to cause a memory fence, due to the thread group thread removal (and many other issues). (it's a synchronized call, and I see no single way to be implemented w/o the sync part since the ThreadGroup must terminate as well if no daemon threads are left, etc).
The started thread ReaderThread is going to keep the process alive since it is not a daemon one!
Thus ready and number will be flushed together (or the number before if a context switch occurs) and there is no real reason for reordering in this case at least I can't even think of one.
You will need something truly weird to see anything but 42. Again I do presume both static variables will be in the same cache line. I just can't imagine a cache line 4 bytes long OR a JVM that will not assign them in a continuous area (cache line).