Lazy initialization without synchronization or volatile keyword

Lazy initialization without synchronization or volatile keyword - java

The other day Howard Lewis Ship posted a blog entry called "Things I Learned at Hacker Bed and Breakfast", one of the bullet points is:
A Java instance field that is assigned exactly once via lazy
initialization does not have to be synchronized or volatile (as long
as you can accept race conditions across threads to assign to the
field); this is from Rich Hickey
On the face of it this seems at odds with the accepted wisdom about visibility of changes to memory across threads, and if this is covered in the Java Concurrency in Practice book or in the Java language spec then I have missed it. But this was something HLS got from Rich Hickey at an event where Brian Goetz was present, so it would seem there must be something to it. Could someone please explain the logic behind this statement?

This statement sounds a little bit cryptic. However, I guess HLS refers to the case when you lazily initialize an instance field and don't care if several threads performs this initialization more than once.
As an example, I can point to the hashCode() method of String class:
private int hashCode;
public int hashCode() {
int hash = hashCode;
if (hash == 0) {
if (count == 0) {
return 0;
}
final int end = count + offset;
final char[] chars = value;
for (int i = offset; i < end; ++i) {
hash = 31*hash + chars[i];
}
hashCode = hash;
}
return hash;
}
As you can see access to the hashCode field (which holds cached value of the computed String hash) is not synchronized and the field isn't declared as volatile. Any thread which calls hashCode() method will still receive the same value, though hashCode field may be written more than once by different threads.
This technique has limited usability. IMHO it's usable mostly for the cases like in the example: a cached primitive/immutable object which is computed from the others final/immutable fields, but its computation in the constructor is an overkill.

Hrm. As I read this it is technically incorrect but okay in practice with some caveats. Only final fields can safely be initialized once and accessed in multiple threads without synchronization.
Lazy initialized threads can suffer from synchronization issues in a number of ways. For example, you can have constructor race conditions where the reference of the class has been exported without the class itself being initialized fully.
I think it highly depends on whether or not you have a primitive field or an object. Primitive fields that can be initialized multiple times where you don't mind that multiple threads do the initialization would work fine. However HashMap style initialization in this manner may be problematic. Even long values on some architectures may store the different words in multiple operations so may export half of the value although I suspect that a long would never cross a memory page so therefore it would never happen.
I think it depends highly on whether or not an application has any memory barriers -- any synchronized blocks or access to volatile fields. The devil is certainly in the details here and the code that does the lazy initialization may work fine on one architecture with one set of code and not in a different thread model or with an application that synchronizes rarely.
Here's a good piece on final fields as a comparison:
http://www.javamex.com/tutorials/synchronization_final.shtml
As of Java 5, one particular use of the final keyword is a very important and often overlooked weapon in your concurrency armoury. Essentially, final can be used to make sure that when you construct an object, another thread accessing that object doesn't see that object in a partially-constructed state, as could otherwise happen. This is because when used as an attribute on the variables of an object, final has the following important characteristic as part of its definition:
Now, even if the field is marked final, if it is a class, you can modify the fields within the class. This is a different issue and you must still have synchronization for this.

This works fine under some conditions.
its okay to try and set the field more than once.
its okay if individual threads see different values.
Often when you create an object which is not changed e.g. loading a Properties from disk, having more than one copy for a short amount of time is not an issue.
private static Properties prop = null;
public static Properties getProperties() {
if (prop == null) {
prop = new Properties();
try {
prop.load(new FileReader("my.properties"));
} catch (IOException e) {
throw new AssertionError(e);
}
}
return prop;
}
In the short term this is less efficient than using locking, but in the long term it could be more efficient. (Although Properties has a lock of it own, but you get the idea ;)
IMHO, Its not a solution which works in all cases.
Perhaps the point is that you can use more relaxed memory consistency techniques in some cases.

I think the statement is untrue. Another thread can see a partially initialized object, so the reference can be visible to another thread even though the constructor hasn't finished running. This is covered in Java Concurrency in Practice, section 3.5.1:
public class Holder {
private int n;
public Holder (int n ) { this.n = n; }
public void assertSanity() {
if (n != n)
throw new AssertionError("This statement is false.");
}
}
This class isn't thread-safe.
If the visible object is immutable, then I you are OK, because of the semantics of final fields means you won't see them until its constructor has finished running (section 3.5.2).

Related

Is field's default value guaranteed to be visible among threads?

Discussing this answer I was wondered why didn't we use sychronization when assigning default values.
class StateHolder {
private int counter = 100;
private boolean isActive = false;
public synchronized void resetCounter() {
counter = 0;
isActive = true;
}
public synchronized void printStateWithLock() {
System.out.println("Counter : " + counter);
System.out.println("IsActive : " + isActive);
}
public void printStateWithNoLock() {
System.out.println("Counter : " + counter);
System.out.println("IsActive : " + isActive);
}
}
This class looks thread safe because access to its fields is managed by a synchronized method. That way, all we have to do is to publish it safely. For instance:
public final StateHolder stateHolder = new StateHolder();
Can it be considered as a safe publication? I think no, it cannot. Consulting the final field semantic (emphasized mine) I figured out that the only thing guarnteeed is that stateHolder reference is not a stale one. :
A thread that can only see a reference to an object after that object
has been completely initialized is guaranteed to see the correctly
initialized values for that object's final fields.
final field semantic is not concerned about the state of the ojbect referenced to by the final field. That way, another thread might as well see default values of the fields.
QUESTION: How can we guarantee memory consistency of the filed's values assigned within a constructor or instance initializer?
I think we have to declare them either volatile or final as that there is no happens-before relationship bethween assigning a reference and constructor invocation. But lots of library classes does not declare fields that way. java.lang.String is an example:
public final class String
implements java.io.Serializable, Comparable<String>, CharSequence{
//...
private int hash; //neither final nor volatile
//...
}

final can guarantee you that you'll see the assigned value to an instance variable after the instance construction without the need of any further action. You just need to make sure that you don't leak the constructed instance within the constructor.
volatile can also guarantee you that you'll see the default value that you set for some instance variable, because the instance variable initializers are guaranteed to be executed before the end of the constructor per JLS 12.5 Creation of New Class Instances.
Safe publication is not entirely trivial, but if you stick with one of the popular mechanisms for achieving it, you should be perfectly fine. You can take a look at Safe Publication and Safe Initialization in Java for some more interesting details.
As for String.hash, it's a popular example of the so-called benign data races. The accesses to the hash instance variable allow both racing of a read and write and racing of two writes. To illustrate just the latter, two threads can simultaneously:
see the initial value of 0
decide that they are the first to calculate the hash
calculate the hash code and write to the same variable without any synchronization
The race was still allowed and considered to be benign because of two reasons:
Hash code calculation for the immutable String instances is an idempotent operation.
Writes of 32-bit values are guaranteed not to tear.
Even benign data races are still not recommended though. See Benign data races: what could possibly go wrong? or Nondeterminism is unavoidable, but data races are pure evil.

Java: Safe to "leak" this-reference in constructor for final class via _happens-before_ relation?

Section 3.2.1 of Goetz's "Java Concurrency in Practice" contains the following rule:
Do not allow the this reference to escape during construction
I understand that, in general, allowing this to escape can lead to other threads seeing incompletely constructed versions of your object and violate the initialization safety guarantee of final fields (as discussed e.g. here)
But is it ever possible to safely leak this? In particular, if you establish a happen-before relationship prior to the leakage?
For example, the official Executor Javadoc says
Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread
My naive reading understanding of the Java memory model this is that something like the following should be safe, even though it's leaking this prior to the end of the constructor:
public final class Foo {
private final String str1;
private String str2;
public Foo(Executor ex) {
str1 = "I'm final";
str2 = "I'm not";
ex.execute(new Runnable() {
// Oops: Leakage!
public void run() { System.out.println(str1 + str2);}
});
}
}
That is, even though we have leaked this to a potentially malicious Executor, the assignments to str1 and str2 happen-before the leakage, so the object is (for all intents and purposes) completely constructed, even though it has not been "completely initialized" per JLS 17.5.
Note that I also am requiring that the class be final, as any subclass's fields would be initialized after the leakage.
Am I missing something here? Is this actually guaranteed to be well-behaved? It looks to me like an legitimate example of "Piggybacking on synchronization" (16.1.4) In general, I would greatly appreciate any pointers to additional resources where these issues are covered.
EDIT: I am aware that, as #jtahlborn noted, I can avoid the issue by using a public static factory. I'm looking for an answer of the question directly to solidify my understanding of the Java memory model.
EDIT #2: This answer alludes to what I'm trying to get at. That is, following the rule from the JLS cited therein is sufficient for guaranteeing visibility of all final fields. But is it necessary, or can we make use of other happen-before mechanisms to ensure our own visibility guarantees?

You are correct. In general, Java memory model does not treat constructors in any special way. Publishing an object reference before or after a constructor exit makes very little difference.
The only exception is, of course, regarding final fields. The exit of a constructor where a final field is written to defines a "freeze" action on the field; if this is published after the freeze, even without happens-before edges, other threads will read the field properly initialized; but not if this is published before the freeze.
Interestingly, if there is constructor chaining, freeze is defined on the smallest scope; e.g.
-- class Bar
final int x;
Bar(int x, int ignore)
{
this.x = x; // assign to final
} // [f] freeze action on this.x
public Bar(int x)
{
this(x, 0);
// [f] is reached!
leak(this);
}
Here leak(this) is safe w.r.t. this.x.
See my other answer for more details on final fields.
If final seems too complicated, it is. My advice is -- forget it! Do not ever rely on final field semantics to publish unsafely. If you program is properly synchronized, you don't need to worry about final fields or their delicate semantics. Unfortunately, the current climate is to push final fields as much as possible, creating an undue pressure on programmers.

Cost of using final fields

We know that making fields final is usually a good idea as we gain thread-safety and immutability which makes the code easier to reason about. I'm curious if there's an associated performance cost.
The Java Memory Model guarantees this final Field Semantics:
A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.
This means that for a class like this
class X {
X(int a) {
this.a = a;
}
final int a;
static X instance;
}
whenever Thread 1 creates an instance like this
X.instance = new X(43);
while (true) doSomethingEventuallyEvictingCache();
and Thread 2 sees it
while (X.instance == null) {
doSomethingEventuallyEvictingCache();
}
System.out.println(X.instance.a);
it must print 43. Without the final modifier, the JIT or the CPU could reorder the stores (first store X.instance and then set a=43) and Thread 2 could see the default-initialized value and print 0 instead.
When JIT sees final it obviously refrains from reordering. But it also has to force the CPU to obey the order. Is there an associated performance penalty?

Is there an associated performance penalty?
If you take a look at the source code of the JIT compiler, you will find the following comment regarding final member variables in the file src/share/vm/opto/parse1.cpp:
This method (which must be a constructor by the rules of Java) wrote a final. The effects of all initializations must be committed to memory before any code after the constructor publishes the reference to the newly constructor object. Rather than wait for the publication, we simply block the writes here. Rather than put a barrier on only those writes which are required to complete, we force all writes to complete.
The compiler emits additional instructions if there are final member variables. Most likely, these additional instructions cause a performance penalty. But it's unclear, if this impact is significant for any application.

Multiple object locks in Java?

Is it safe/acceptable practice to lock on a private field variable (instead of using a lock object)? This way, I could have different locks for different purposes. Example below:
class Test {
private Integer x = 0;
private Integer y = 0;
public void incrementX() {
synchronized(x) {
x++;
}
}
public void decrementX() {
synchronized(x) {
x++;
}
}
public void incrementY() {
synchronized(y) {
y++;
}
}
public void decrementY() {
synchronized(y) {
y++;
}
}
Or should I have a lock object for each private member I wish to lock? Example:
class Test {
private final Object xLock = new Object();
private final Object yLock = new Object();
private Integer x = 0;
private Integer y = 0;
...
}
Or should I just have a single general lock and use that for all private variables that require locking? Example:
class Test {
private final Object objLock = new Object();
private Integer x = 0;
private Integer y = 0;
...
}

Beware to always use a final member var for the lock! If you use an Integer, for example, and you plan to change it, that will be very bad practice since each call will see a different object and cause a data race.
Whether you use one or several locks depends on the coordination scheme you want to achieve, so it's entirely domain-specific. You must think through carefully which operations are and which aren't mutually exclusive and assign locks to them appropriately. There is no single best practice here.
If you have two orthogonal operations on your object that may happen simultaneously without causing any datarace, that's a case for two locks. In your example there are two Integers, each changing independently. I see this as a case for two locks. If you had more complex code where in at least one operation you needed to access both Integers, that would tie them together and then you would need a single lock.

It's perfectly acceptable to lock on a private field, as long as this field is an object. Primitives don't have an intrinsic lock, and the first snippet is thus invalid.
I would however avoid locking on a private field if this field is accessible from the outside (using a getter, for example), as this would allow anyone to lock on the same object for different purposes. The second solution is thus the cleanest, IMHO.
Using a single lock is counter-productive, since it prevents concurrent access to methods which should be able to run concurrently. It's thus generally better to have fine-grained the locks.
EDIT:
now that you have changed your question and use wrapper objects, the locking on the private Integer instances is really not a good solution, as you change the values of these variables inside the methods. Use final fields as locks.
Remember that x++, if x is an Integer instance, is equivalent to:
int temp = x.intValue();
temp++;
x = Integer.valueOf(temp);
Moreover, since Integer.valueOf() caches Integer instances, you might have several classes using the same Integer instance to lock completely different things. A recipe for slow execution and deadlocks.

I think you should have two different lock for the two fields. You lock the object to prevent two or more thread access the same object at the same time.
You can also take a look to the Lock object in java http://docs.oracle.com/javase/1.5.0/docs/api/java/util/concurrent/locks/Lock.html
It is more performant than the synchronize, and in the java.util.concurrent there are some utility class to work with locks (also a ReadWriteLock, if you need it)

AFAIK the lock object you use is only as an ID. I mean, you can use whatever object you want. The only important thing is "if two things must be mutually exclusive, then they must use the same lock".
So the approach of using the very own var seems ok.
BUT, remember!!
I don't think you can lock on a primitive, it must be an Object
If you change the field value, next process will acquire a different lock!!!
So the separate lock seems safer. Except you are absolutely sure your field is not going to change (in fact, you should declare it as final).

Choosing when to instantiate classes

I recently wrote a class for an assignment in which I had to store names in an ArrayList (in java). I initialized the ArrayList as an instance variable private ArrayList<String> names. Later when I checked my work against the solution, I noticed that they had initialized their ArrayList in the run() method instead.
I thought about this for a bit and I kind of feel it might be a matter of taste, but in general how does one choose in situations like this? Does one take up less memory or something?
PS I like the instance variables in Ruby that start with an # symbol: they are lovelier.
(meta-question: What would be a better title for this question?)

In the words of the great Knuth "Premature optimization is the root of all evil".
Just worry that your program functions correctly and that it does not have bugs. This is far more important than an obscure optimization that will be hard to debug later on.
But to answer your question - if you initialize in the class member, the memory will be allocated the first time a mention of your class is done in the code (i.e. when you call a method from it). If you initialize in a method, the memory allocation occurs later, when you call this specific method.
So it is only a question of initializing later... this is called lazy initialization in the industry.

Initialization
As a rule of thumb, try to initialize variables when they are declared.
If the value of a variable is intended never to change, make that explicit with use of the final keyword. This helps you reason about the correctness of your code, and while I'm not aware of compiler or JVM optimizations that recognize the final keyword, they would certainly be possible.
Of course, there are exceptions to this rule. For example, a variable may by be assigned in an if–else or a switch. In a case like that, a "blank" declaration (one with no initialization) is preferable to an initialization that is guaranteed to be overwritten before the dummy value is read.
/* DON'T DO THIS! */
Color color = null;
switch(colorCode) {
case RED: color = new Color("crimson"); break;
case GREEN: color = new Color("lime"); break;
case BLUE: color = new Color("azure"); break;
}
color.fill(widget);
Now you have a NullPointerException if an unrecognized color code is presented. It would be better not to assign the meaningless null. The compiler would produce an error at the color.fill() call, because it would detect that you might not have initialized color.
In order to answer your question in this case, I'd have to see the code in question. If the solution initialized it inside the run() method, it must have been used either as temporary storage, or as a way to "return" the results of the task.
If the collection is used as temporary storage, and isn't accessible outside of the method, it should be declared as a local variable, not an instance variable, and most likely, should be initialized where it's declared in the method.
Concurrency Issues
For a beginning programming course, your instructor probably wasn't trying to confront you with the complexities of concurrent programming—although if that's the case, I'm not sure why you were using a Thread. But, with current trends in CPU design, anyone who is learning to program needs to have a firm grasp on concurrency. I'll try to delve a little deeper here.
Returning results from a thread's run method is a bit tricky. This method is the Runnable interface, and there's nothing stopping multiple threads from executing the run method of a single instance. The resulting concurrency issues are part of the motivation behind the Callable interface introduced in Java 5. It's much like Runnable, but can return a result in a thread-safe manner, and throw an Exception if the task can't be executed.
It's a bit of a digression, but if you are curious, consider the following example:
class Oops extends Thread { /* Note that thread implements "Runnable" */
private int counter = 0;
private Collection<Integer> state = ...;
public void run() {
state.add(counter);
counter++;
}
public static void main(String... argv) throws Exception {
Oops oops = new Oops();
oops.start();
Thread t2 = new Thread(oops); /* Now pass the same Runnable to a new Thread. */
t2.start(); /* Execute the "run" method of the same instance again. */
...
}
}
By the end of the the main method you pretty much have no idea what the "state" of the Collection is. Two threads are working on it concurrently, and we haven't specified whether the collection is safe for concurrent use. If we initialize it inside the thread, at least we can say that eventually, state will contain one element, but we can't say whether it's 0 or 1.

From wikibooks:
There are three basic kinds of scope for variables in Java:
local variable, declared within a method in a class, valid for (and occupying storage only for) the time that method is executing. Every time the method is called, a new copy of the variable is used.
instance variable, declared within a class but outside any method. It is valid for and occupies storage for as long as the corresponding object is in memory; a program can instantiate multiple objects of the class, and each one gets its own copy of all instance variables. This is the basic data structure rule of Object-Oriented programming; classes are defined to hold data specific to a "class of objects" in a given system, and each instance holds its own data.
static variable, declared within a class as static, outside any method. There is only one copy of such a variable no matter how many objects are instantiated from that class.
So yes, memory consumption is an issue, especially if the ArrayList inside run() is local.

I am not completely I understand your complete problem.
But as far as I understand it right now, the performance/memory benefit will be rather minor. Therefore I would definitely favour the easibility side.
So do what suits you the best. Only address performance/memory optimisation when needed.

My personal rule of thumb for instance variables is to initialize them, at least with a default value, either:
at delcaration time, i.e.
private ArrayList<String> myStrings = new ArrayList<String>();
in the constructor
If it's something that really is an instance variable, and represents state of the object, it is then completely initialized by the time the constructor exits. Otherwise, you open yourself to the possibility of trying to access the variable before it has a value. Of course, that doesn't apply to primitives where you will get a default value automatically.
For static (class-level) variables, initialize them in the declaration or in a static initializer. I use a static initializer if I have do calculations or other work to get a value. Initialize in the declaration if you're just calling new Foo() or setting the variable to a known value.

You have to avoid Lazy initialization. It leads to problems later.
But if you have to do it because the initialization is too heavy you have to do it like this:
Static fields:
// Lazy initialization holder class idiom for static fields
private static class FieldHolder {
static final FieldType field = computeFieldValue();
}
static FieldType getField() { return FieldHolder.field; }
Instance fields:
// Double-check idiom for lazy initialization of instance fields
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) { // First check (no locking)
synchronized(this) {
result = field;
if (result == null) // Second check (with locking)
field = result = computeFieldValue();
}
}
return result;
}
Acording to Joshua Bolch book's "Effective Java™
Second Edition" (ISBN-13: 978-0-321-35668-0):
"Use lazy initialization judiciously"

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.