java: why should this not be allowed to escape the constructor?

java: why should this not be allowed to escape the constructor? - java

In JCIP, section 3.2.1 "Safe Constructor Practices", there is a warning against leaking this to another thread from the constructor, "even if the publication is the last statement in the constructor." That last part seems too strong to me, and it's provided with no justification. What is it that happens after construction that I must be so careful to avoid? Are there exceptions? I'm interested because I recently submitted some code in which I did this very thing, and I want to decide whether there is justification to go back through and refactor.

As far as Java memory model is concerned, the constructor exit plays a role in final field semantics, therefore there's a difference whether a statement is before or after constructor exit.
This works This doesn't work
-------------------------------------------------------------
static Foo shared; static Foo shared;
class Foo class Foo
{ {
final int i; final int i;
Foo() Foo()
{ {
i = 1; i = 1;
shared = this;
} }
} }
shared = new Foo(); new Foo();
(Note: shared is not volatile; the publication is through data race.)
The only difference between the 2 examples is assigning shared before or after constructor exit. In the second example, i=1 is allowed to be reordered after the assignment.
However, if the publication is a synchronized action, e.g. through a volatile variable, then it's ok; other threads will observe a fully initialized object; the fields don't even have to final.
Publication through data race (or doing anything through data race) is a very tricky business that requires very careful reasoning. If you avoid data race, things are much simpler. If your code contains no data race, there's no difference between leaking this immediately before constructor exit, and publishing it immediately after constructor exit.

You should never leak this from a constructor at any point, "even [...] in the last statement." Since this isn't fully constructed some very odd things can happen. See this SO answer on a very similar question.

You should never pass this out of the constructor (known as "leaking this")
One reason you shouldn't do this, even if it's the last line of the constructor, is that the JVM is allowed to re-order statements as long as the effect on the current thread isn't affected. If this is passed to a process running in another thread, reordering can cause weird and subtle bugs.
Another reason is that subclasses may provide their own initialization, so construction may not be complete at the last line of your class' constructor.

Related

Java: Safe to "leak" this-reference in constructor for final class via _happens-before_ relation?

Section 3.2.1 of Goetz's "Java Concurrency in Practice" contains the following rule:
Do not allow the this reference to escape during construction
I understand that, in general, allowing this to escape can lead to other threads seeing incompletely constructed versions of your object and violate the initialization safety guarantee of final fields (as discussed e.g. here)
But is it ever possible to safely leak this? In particular, if you establish a happen-before relationship prior to the leakage?
For example, the official Executor Javadoc says
Actions in a thread prior to submitting a Runnable object to an Executor happen-before its execution begins, perhaps in another thread
My naive reading understanding of the Java memory model this is that something like the following should be safe, even though it's leaking this prior to the end of the constructor:
public final class Foo {
private final String str1;
private String str2;
public Foo(Executor ex) {
str1 = "I'm final";
str2 = "I'm not";
ex.execute(new Runnable() {
// Oops: Leakage!
public void run() { System.out.println(str1 + str2);}
});
}
}
That is, even though we have leaked this to a potentially malicious Executor, the assignments to str1 and str2 happen-before the leakage, so the object is (for all intents and purposes) completely constructed, even though it has not been "completely initialized" per JLS 17.5.
Note that I also am requiring that the class be final, as any subclass's fields would be initialized after the leakage.
Am I missing something here? Is this actually guaranteed to be well-behaved? It looks to me like an legitimate example of "Piggybacking on synchronization" (16.1.4) In general, I would greatly appreciate any pointers to additional resources where these issues are covered.
EDIT: I am aware that, as #jtahlborn noted, I can avoid the issue by using a public static factory. I'm looking for an answer of the question directly to solidify my understanding of the Java memory model.
EDIT #2: This answer alludes to what I'm trying to get at. That is, following the rule from the JLS cited therein is sufficient for guaranteeing visibility of all final fields. But is it necessary, or can we make use of other happen-before mechanisms to ensure our own visibility guarantees?

You are correct. In general, Java memory model does not treat constructors in any special way. Publishing an object reference before or after a constructor exit makes very little difference.
The only exception is, of course, regarding final fields. The exit of a constructor where a final field is written to defines a "freeze" action on the field; if this is published after the freeze, even without happens-before edges, other threads will read the field properly initialized; but not if this is published before the freeze.
Interestingly, if there is constructor chaining, freeze is defined on the smallest scope; e.g.
-- class Bar
final int x;
Bar(int x, int ignore)
{
this.x = x; // assign to final
} // [f] freeze action on this.x
public Bar(int x)
{
this(x, 0);
// [f] is reached!
leak(this);
}
Here leak(this) is safe w.r.t. this.x.
See my other answer for more details on final fields.
If final seems too complicated, it is. My advice is -- forget it! Do not ever rely on final field semantics to publish unsafely. If you program is properly synchronized, you don't need to worry about final fields or their delicate semantics. Unfortunately, the current climate is to push final fields as much as possible, creating an undue pressure on programmers.

Can a volatile variable that is never assigned null to ever contain null?

Can in the following conceptual Java example:
public class X implements Runnable {
public volatile Object x = new Object();
#Runnable
public void run() {
for (;;) {
Thread.sleep(1000);
x = new Object();
}
}
}
x ever be read as null from another thread?
Bonus: Do I need to declare it volatile (I do not really care about that value, it suffices that sometime in the future it will be the newly assigned value and never is null)

Technically, yes it can. That is the main reason for the original ConcurrentHashMap's readUnderLock. The javadoc even explains how:
Reads value field of an entry under lock. Called if value field ever appears to be null. This is possible only if a compiler happens to reorder a HashEntry initialization with its table assignment, which is legal under memory model but is not known to ever occur.
Since the HashEntry's value is volatile this type of reordering is legal on consturction.
Moral of the story is that all non-final initializations can race with object construction.
Edit:
#Nathan Hughes asked a valid question:
#John: in the OP's example wouldn't the construction have happened before the thread the runnable is passed into started? it would seem like that would impose a happens-before barrier subsequent to the field's initialization.
Doug Lea had a couple comments on this topic, the entire thread can be read here. He answered the comment:
But the issue is whether assignment of the new C instance to some other memory must occur after the volatile stores.
With the answer
Sorry for mis-remembering why I had treated this issue as basically settled:
Unless a JVM always pre-zeros memory (which usually not a good option), then
even if not explicitly initialized, volatile fields must be zeroed
in the constructor body, with a release fence before publication.
And so even though there are cases in which the JMM does not
strictly require mechanics preventing publication reordering
in constructors of classes with volatile fields, the only good
implementation choices for JVMs are either to use non-volatile writes
with a trailing release fence, or to perform each volatile write
with full fencing. Either way, there is no reordering with publication.
Unfortunately, programmers cannot rely on a spec to guarantee
it, at least until the JMM is revised.
And finished with:
Programmers do not expect that even though final fields are specifically
publication-safe, volatile fields are not always so.
For various implementation reasons, JVMs arrange that
volatile fields are publication safe anyway, at least in
cases we know about.
Actually updating the JMM/JLS to mandate this is not easy
(no small tweak that I know applies). But now is a good time
to be considering a full revision for JDK9.
In the mean time, it would make sense to further test
and validate JVMs as meeting this likely future spec.

This depends on how the X instance is published.
Suppose x is published unsafely, eg. through a non-volatile field
private X instance;
...
void someMethod() {
instance = new X();
}
Another thread accessing the instance field is allowed to see a reference value referring to an uninitialized X object (ie. where its constructor hasn't run yet). In such a case, its field x would have a value of null.
The above example translates to
temporaryReferenceOnStack = new memory for X // a reference to the instance
temporaryReferenceOnStack.<init> // call constructor
instance = temporaryReferenceOnStack;
But the language allows the following reordering
temporaryReferenceOnStack = new memory for X // a reference to the instance
instance = temporaryReferenceOnStack;
temporaryReferenceOnStack.<init> // call constructor
or directly
instance = new memory for X // a reference to the instance
instance.<init> // call constructor
In such a case, a thread is allowed to see the value of instance before the constructor is invoked to initialize the referenced object.
Now, how likely this is to happen in current JVMs? Eh, I couldn't come up with an MCVE.
Bonus: Do I need to declare it volatile (I do not really care about
that value, it suffices that sometime in the future it will be the
newly assigned value and never is null)
Publish the enclosing object safely. Or use a final AtomicReference field which you set.

No. The Java memory model guarantees that you will never seen x as null. x must always be the initial value it was assigned, or some subsequent value.
This actually works with any variable, not just volatile. What you are asking about is called "out of thin air values". C.f. Java Concurrency in Practice which talks about this concept in some length.
The other part of your question "Do I need to declare x as volatile:" given the context, yes, it should be either volatile or final. Either one provides safe publication for your object referenced by x. C.f. Safe Publication. Obviously, x can't be changed later if it's final.

Cost of using final fields

We know that making fields final is usually a good idea as we gain thread-safety and immutability which makes the code easier to reason about. I'm curious if there's an associated performance cost.
The Java Memory Model guarantees this final Field Semantics:
A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.
This means that for a class like this
class X {
X(int a) {
this.a = a;
}
final int a;
static X instance;
}
whenever Thread 1 creates an instance like this
X.instance = new X(43);
while (true) doSomethingEventuallyEvictingCache();
and Thread 2 sees it
while (X.instance == null) {
doSomethingEventuallyEvictingCache();
}
System.out.println(X.instance.a);
it must print 43. Without the final modifier, the JIT or the CPU could reorder the stores (first store X.instance and then set a=43) and Thread 2 could see the default-initialized value and print 0 instead.
When JIT sees final it obviously refrains from reordering. But it also has to force the CPU to obey the order. Is there an associated performance penalty?

Is there an associated performance penalty?
If you take a look at the source code of the JIT compiler, you will find the following comment regarding final member variables in the file src/share/vm/opto/parse1.cpp:
This method (which must be a constructor by the rules of Java) wrote a final. The effects of all initializations must be committed to memory before any code after the constructor publishes the reference to the newly constructor object. Rather than wait for the publication, we simply block the writes here. Rather than put a barrier on only those writes which are required to complete, we force all writes to complete.
The compiler emits additional instructions if there are final member variables. Most likely, these additional instructions cause a performance penalty. But it's unclear, if this impact is significant for any application.

Is this technically thread safe despite being mutable?

Yes, the private member variable bar should be final right? But actually, in this instance, it is an atomic operation to simply read the value of an int. So is this technically thread safe?
class Foo {
private int bar;
public Foo(int bar) {
this.bar = bar;
}
public int getBar() {
return bar;
}
}
// assume infinite number of threads repeatedly calling getBar on the same instance of Foo.
EDIT:
Assume that this is all of the code for the Foo class; any threads with a reference to a Foo instance will not be able to change the value of bar (without going to such lengths as using reflection etc.)

Final update: so my first conclusion happened to be right, just my reasoning was faulty :-( I re-edited my answer to make it somewhat coherent, not to hide the traces of my earlier blunder.
Conclusion
As #Wyzard pointed out, even though there is no way to change bar after construction, Foo is still not thread safe. The problem is not atomicity but visibility. If thread 1 is changing the value of bar in the constructor (from its default value of 0), there is no guarantee when other threads will get to see the new value (or whether they see it at all).
So foo looks like an immutable object. Quoting from Java Concurrency in Practice, section 3.4:
An object is immutable if:
Its state cannot be modified after construction;
All its fields are final; and
It is properly constructed (the this reference does not escape during construction).
Foo looks OK on 1) and 3), but not 2). And that is a crucial point, due to the reasoning above. Declaring a variable final is one way of ensuring its visibility between different threads. The other means are declaring bar volatile, or synchronizing its access method(s). But of course, in case of an immutable object, neither of these would make much sense.
Final Fields
So why do finalfields guarantee visibility? Answer from Java Concurrency in Practice, section 3.5.2:
Because immutable objects are so important, the JavaMemory Model offers a special guarantee of initialization safety for sharing immutable objects. As we've seen, that an object reference becomes visible to another thread does not necessarily mean that the state of that object is visible to the consuming thread. In order to guarantee a consistent view of the object's state, synchronization is needed.
Immutable objects, on the other hand, can be safely accessed even when synchronization is not used to publish the object reference. For this guarantee of initialization safety to hold, all of the requirements for immutability must be met: unmodi-fiable state, all fields are final, and proper construction. [...]
Immutable objects can be used safely by any thread without additional synchronization, even when synchronization is not used to publish them.
This guarantee extends to the values of all final fields of properly constructed objects; final fields can be safely accessed without additional synchronization. However, if final fields refer to mutable objects, synchronization is still required to access the state of the objects they refer to.
And what happens if the field is not final? Other threads may silently see a stale value of the field. There is no exception or any kind of warning - that is one reason why these kinds of bugs are so difficult to trace.

Choosing when to instantiate classes

I recently wrote a class for an assignment in which I had to store names in an ArrayList (in java). I initialized the ArrayList as an instance variable private ArrayList<String> names. Later when I checked my work against the solution, I noticed that they had initialized their ArrayList in the run() method instead.
I thought about this for a bit and I kind of feel it might be a matter of taste, but in general how does one choose in situations like this? Does one take up less memory or something?
PS I like the instance variables in Ruby that start with an # symbol: they are lovelier.
(meta-question: What would be a better title for this question?)

In the words of the great Knuth "Premature optimization is the root of all evil".
Just worry that your program functions correctly and that it does not have bugs. This is far more important than an obscure optimization that will be hard to debug later on.
But to answer your question - if you initialize in the class member, the memory will be allocated the first time a mention of your class is done in the code (i.e. when you call a method from it). If you initialize in a method, the memory allocation occurs later, when you call this specific method.
So it is only a question of initializing later... this is called lazy initialization in the industry.

Initialization
As a rule of thumb, try to initialize variables when they are declared.
If the value of a variable is intended never to change, make that explicit with use of the final keyword. This helps you reason about the correctness of your code, and while I'm not aware of compiler or JVM optimizations that recognize the final keyword, they would certainly be possible.
Of course, there are exceptions to this rule. For example, a variable may by be assigned in an if–else or a switch. In a case like that, a "blank" declaration (one with no initialization) is preferable to an initialization that is guaranteed to be overwritten before the dummy value is read.
/* DON'T DO THIS! */
Color color = null;
switch(colorCode) {
case RED: color = new Color("crimson"); break;
case GREEN: color = new Color("lime"); break;
case BLUE: color = new Color("azure"); break;
}
color.fill(widget);
Now you have a NullPointerException if an unrecognized color code is presented. It would be better not to assign the meaningless null. The compiler would produce an error at the color.fill() call, because it would detect that you might not have initialized color.
In order to answer your question in this case, I'd have to see the code in question. If the solution initialized it inside the run() method, it must have been used either as temporary storage, or as a way to "return" the results of the task.
If the collection is used as temporary storage, and isn't accessible outside of the method, it should be declared as a local variable, not an instance variable, and most likely, should be initialized where it's declared in the method.
Concurrency Issues
For a beginning programming course, your instructor probably wasn't trying to confront you with the complexities of concurrent programming—although if that's the case, I'm not sure why you were using a Thread. But, with current trends in CPU design, anyone who is learning to program needs to have a firm grasp on concurrency. I'll try to delve a little deeper here.
Returning results from a thread's run method is a bit tricky. This method is the Runnable interface, and there's nothing stopping multiple threads from executing the run method of a single instance. The resulting concurrency issues are part of the motivation behind the Callable interface introduced in Java 5. It's much like Runnable, but can return a result in a thread-safe manner, and throw an Exception if the task can't be executed.
It's a bit of a digression, but if you are curious, consider the following example:
class Oops extends Thread { /* Note that thread implements "Runnable" */
private int counter = 0;
private Collection<Integer> state = ...;
public void run() {
state.add(counter);
counter++;
}
public static void main(String... argv) throws Exception {
Oops oops = new Oops();
oops.start();
Thread t2 = new Thread(oops); /* Now pass the same Runnable to a new Thread. */
t2.start(); /* Execute the "run" method of the same instance again. */
...
}
}
By the end of the the main method you pretty much have no idea what the "state" of the Collection is. Two threads are working on it concurrently, and we haven't specified whether the collection is safe for concurrent use. If we initialize it inside the thread, at least we can say that eventually, state will contain one element, but we can't say whether it's 0 or 1.

From wikibooks:
There are three basic kinds of scope for variables in Java:
local variable, declared within a method in a class, valid for (and occupying storage only for) the time that method is executing. Every time the method is called, a new copy of the variable is used.
instance variable, declared within a class but outside any method. It is valid for and occupies storage for as long as the corresponding object is in memory; a program can instantiate multiple objects of the class, and each one gets its own copy of all instance variables. This is the basic data structure rule of Object-Oriented programming; classes are defined to hold data specific to a "class of objects" in a given system, and each instance holds its own data.
static variable, declared within a class as static, outside any method. There is only one copy of such a variable no matter how many objects are instantiated from that class.
So yes, memory consumption is an issue, especially if the ArrayList inside run() is local.

I am not completely I understand your complete problem.
But as far as I understand it right now, the performance/memory benefit will be rather minor. Therefore I would definitely favour the easibility side.
So do what suits you the best. Only address performance/memory optimisation when needed.

My personal rule of thumb for instance variables is to initialize them, at least with a default value, either:
at delcaration time, i.e.
private ArrayList<String> myStrings = new ArrayList<String>();
in the constructor
If it's something that really is an instance variable, and represents state of the object, it is then completely initialized by the time the constructor exits. Otherwise, you open yourself to the possibility of trying to access the variable before it has a value. Of course, that doesn't apply to primitives where you will get a default value automatically.
For static (class-level) variables, initialize them in the declaration or in a static initializer. I use a static initializer if I have do calculations or other work to get a value. Initialize in the declaration if you're just calling new Foo() or setting the variable to a known value.

You have to avoid Lazy initialization. It leads to problems later.
But if you have to do it because the initialization is too heavy you have to do it like this:
Static fields:
// Lazy initialization holder class idiom for static fields
private static class FieldHolder {
static final FieldType field = computeFieldValue();
}
static FieldType getField() { return FieldHolder.field; }
Instance fields:
// Double-check idiom for lazy initialization of instance fields
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) { // First check (no locking)
synchronized(this) {
result = field;
if (result == null) // Second check (with locking)
field = result = computeFieldValue();
}
}
return result;
}
Acording to Joshua Bolch book's "Effective Java™
Second Edition" (ISBN-13: 978-0-321-35668-0):
"Use lazy initialization judiciously"

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.