Choosing when to instantiate classes - java

I recently wrote a class for an assignment in which I had to store names in an ArrayList (in java). I initialized the ArrayList as an instance variable private ArrayList<String> names. Later when I checked my work against the solution, I noticed that they had initialized their ArrayList in the run() method instead.
I thought about this for a bit and I kind of feel it might be a matter of taste, but in general how does one choose in situations like this? Does one take up less memory or something?
PS I like the instance variables in Ruby that start with an # symbol: they are lovelier.
(meta-question: What would be a better title for this question?)

In the words of the great Knuth "Premature optimization is the root of all evil".
Just worry that your program functions correctly and that it does not have bugs. This is far more important than an obscure optimization that will be hard to debug later on.
But to answer your question - if you initialize in the class member, the memory will be allocated the first time a mention of your class is done in the code (i.e. when you call a method from it). If you initialize in a method, the memory allocation occurs later, when you call this specific method.
So it is only a question of initializing later... this is called lazy initialization in the industry.

Initialization
As a rule of thumb, try to initialize variables when they are declared.
If the value of a variable is intended never to change, make that explicit with use of the final keyword. This helps you reason about the correctness of your code, and while I'm not aware of compiler or JVM optimizations that recognize the final keyword, they would certainly be possible.
Of course, there are exceptions to this rule. For example, a variable may by be assigned in an if–else or a switch. In a case like that, a "blank" declaration (one with no initialization) is preferable to an initialization that is guaranteed to be overwritten before the dummy value is read.
/* DON'T DO THIS! */
Color color = null;
switch(colorCode) {
case RED: color = new Color("crimson"); break;
case GREEN: color = new Color("lime"); break;
case BLUE: color = new Color("azure"); break;
}
color.fill(widget);
Now you have a NullPointerException if an unrecognized color code is presented. It would be better not to assign the meaningless null. The compiler would produce an error at the color.fill() call, because it would detect that you might not have initialized color.
In order to answer your question in this case, I'd have to see the code in question. If the solution initialized it inside the run() method, it must have been used either as temporary storage, or as a way to "return" the results of the task.
If the collection is used as temporary storage, and isn't accessible outside of the method, it should be declared as a local variable, not an instance variable, and most likely, should be initialized where it's declared in the method.
Concurrency Issues
For a beginning programming course, your instructor probably wasn't trying to confront you with the complexities of concurrent programming—although if that's the case, I'm not sure why you were using a Thread. But, with current trends in CPU design, anyone who is learning to program needs to have a firm grasp on concurrency. I'll try to delve a little deeper here.
Returning results from a thread's run method is a bit tricky. This method is the Runnable interface, and there's nothing stopping multiple threads from executing the run method of a single instance. The resulting concurrency issues are part of the motivation behind the Callable interface introduced in Java 5. It's much like Runnable, but can return a result in a thread-safe manner, and throw an Exception if the task can't be executed.
It's a bit of a digression, but if you are curious, consider the following example:
class Oops extends Thread { /* Note that thread implements "Runnable" */
private int counter = 0;
private Collection<Integer> state = ...;
public void run() {
state.add(counter);
counter++;
}
public static void main(String... argv) throws Exception {
Oops oops = new Oops();
oops.start();
Thread t2 = new Thread(oops); /* Now pass the same Runnable to a new Thread. */
t2.start(); /* Execute the "run" method of the same instance again. */
...
}
}
By the end of the the main method you pretty much have no idea what the "state" of the Collection is. Two threads are working on it concurrently, and we haven't specified whether the collection is safe for concurrent use. If we initialize it inside the thread, at least we can say that eventually, state will contain one element, but we can't say whether it's 0 or 1.

From wikibooks:
There are three basic kinds of scope for variables in Java:
local variable, declared within a method in a class, valid for (and occupying storage only for) the time that method is executing. Every time the method is called, a new copy of the variable is used.
instance variable, declared within a class but outside any method. It is valid for and occupies storage for as long as the corresponding object is in memory; a program can instantiate multiple objects of the class, and each one gets its own copy of all instance variables. This is the basic data structure rule of Object-Oriented programming; classes are defined to hold data specific to a "class of objects" in a given system, and each instance holds its own data.
static variable, declared within a class as static, outside any method. There is only one copy of such a variable no matter how many objects are instantiated from that class.
So yes, memory consumption is an issue, especially if the ArrayList inside run() is local.

I am not completely I understand your complete problem.
But as far as I understand it right now, the performance/memory benefit will be rather minor. Therefore I would definitely favour the easibility side.
So do what suits you the best. Only address performance/memory optimisation when needed.

My personal rule of thumb for instance variables is to initialize them, at least with a default value, either:
at delcaration time, i.e.
private ArrayList<String> myStrings = new ArrayList<String>();
in the constructor
If it's something that really is an instance variable, and represents state of the object, it is then completely initialized by the time the constructor exits. Otherwise, you open yourself to the possibility of trying to access the variable before it has a value. Of course, that doesn't apply to primitives where you will get a default value automatically.
For static (class-level) variables, initialize them in the declaration or in a static initializer. I use a static initializer if I have do calculations or other work to get a value. Initialize in the declaration if you're just calling new Foo() or setting the variable to a known value.

You have to avoid Lazy initialization. It leads to problems later.
But if you have to do it because the initialization is too heavy you have to do it like this:
Static fields:
// Lazy initialization holder class idiom for static fields
private static class FieldHolder {
static final FieldType field = computeFieldValue();
}
static FieldType getField() { return FieldHolder.field; }
Instance fields:
// Double-check idiom for lazy initialization of instance fields
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) { // First check (no locking)
synchronized(this) {
result = field;
if (result == null) // Second check (with locking)
field = result = computeFieldValue();
}
}
return result;
}
Acording to Joshua Bolch book's "Effective Java™
Second Edition" (ISBN-13: 978-0-321-35668-0):
"Use lazy initialization judiciously"

Related

How to make my code thread-safe when my shared variable can change anytime?

Here is a question that has been asked many times, I have double-checked numerous issues that have been raised formerly but none gave me an answer element so I thought I would put it here.
The question is about making my code thread-safe in java knowing that there is only one shared variable but it can change anytime and actually I have the feeling that the code I am optimizing has not been thought for a multi-threading environment, so I might have to think it over...
Basically, I have one class which can be shared between, say, 5 threads. This class has a private property 'myProperty' which can take 5 different values (one for each thread). The problem is that, once it's instantiated by the constructor, that value should not be changed anymore for the rest of the thread's life.
I am pretty well aware of some techniques used to turn most of pieces of code "thead-safe" including locks, the "synchronized" keyword, volatile variables and atomic types but I have the feeling that these won't help in the current situation as they do not prevent the variable from being modified.
Here is the code :
// The thread that calls for the class containing the shared variable //
public class myThread implements Runnable {
#Autowired
private Shared myProperty;
//some code
}
// The class containing the shared variable //
public class Shared {
private String operator;
private Lock lock = new ReentrantLock();
public void inititiate(){
this.lock.lock()
try{
this.operator.initiate() // Gets a different value depending on the calling thread
} finally {
this.lock.unlock();
}
}
// some code
}
As it happens, the above code only guarantees that two threads won't change the variable at the same time, but the latter will still change. A "naive" workaround would consist in creating a table (operatorList) for instance (or a list, a map, etc. ) associating an operator with its calling thread's ID, this way each thread would just have to access its operator using its id in the table but doing this would make us change all the thread classes which access the shared variable and there are many. Any idea as to how I could store the different operator string values in an exclusive manner for each calling thread with minimal changes (without using magic) ?
I'm not 100% sure I understood your question correctly, but I'll give it a shot anyway. Correct me if I'm wrong.
A "naive" workaround would consist in creating a table (operatorList)
for instance (or a list, a map, etc. ) associating an operator with
its calling thread's ID, this way each thread would just have to
access its operator using its id in the table but doing this would
make us change all the thread classes which access the shared variable
and there are many.
There's already something similar in Java - the ThreadLocal class?
You can create a thread-local copy of any object:
private static final ThreadLocal<MyObject> operator =
new ThreadLocal<MyObject>() {
#Override
protected MyObject initialValue() {
// return thread-local copy of the "MyObject"
}
};
Later in your code, when a specific thread needs to get its own local copy, all it needs to do is: operator.get(). In reality, the implementation of ThreadLocal is similar to what you've described - a Map of ThreadLocal values for each Thread. Only the Map is not static, and is actually tied to the specific thread. This way, when a thread dies, it takes its ThreadLocal variables with it.
I'm not sure if I totally understand the situation, but if you want to ensure that each thread uses a thread-specific instance for a variable, the solution is use a variable of type ThreadLocal<T>.

How to achieve synchronisation in a better way with singleton approach?

Class which is implemented with Singleton Pattern is as follows, when multiple threads access this method only one thread has to create the instance so all I am doing is synchronising the method
private static synchronized FactoryAPI getIOInstance(){
if(factoryAPI == null){
FileUtils.initWrapperProp();
factoryAPI = new FactoryAPIImpl();
}
return factoryAPI;
}
which I feel is unnecessary because only for the first time the instance would be created and for the rest of the time the instance created already would be returned. When adding synchronised to block allows only one thread to access the method at a time.
The getIOInstance does two jobs
i) Initialising properties and
ii) Creating a new instance for the first time
So, I'm trying to do block level synchronisation here like the following
private static FactoryAPI getIOInstance(){
if(factoryAPI == null){
synchronised {
if(factoryAPI == null){
FileUtils.initWrapperProp();
factoryAPI = new FactoryAPIImpl();
}
}
}
return factoryAPI;
}
I prefer the second one to be the right one. Am I using it in a right way? Any suggestions are welcome.
Use the first method because the second one is not thread-safe.
When you say,
factoryAPI = new FactoryAPIImpl();
The compiler is free to execute the code in the following order:
1) Allocate some memory on the heap
2) Initialize factoryAPI to the address of that allocated space
3) Call the constructor of FactoryAPIImpl
The problem is when another thread calls getIOInstance() after step 2 and before step 3. It may see a non-null factoryAPI variable that points to an uninitialized FactoryAPI instance.
There are many different answers to this problem, you can find an extensive discussion at SEI for example.
The modern Java solution is simple: use an enum - as the JLS guarantees you that the compiler / JVM will create exactly one thing.
Found Initialization-on-demand holder method of initialising as an interesting one like the following,
public class FactoryAPI {
private FactoryAPI() {}
private static class LazyHolder {
static final Something INSTANCE = new Something();
}
public static Something getInstance() {
return FactoryAPI.INSTANCE;
}
}
Since the class initialization phase is guaranteed by the JLS to be serial, i.e., non-concurrent, no further synchronization is required in the static getInstance method during loading and initialization.
And since the initialization phase writes the static variable INSTANCE in a serial operation, all subsequent concurrent invocations of the getInstance will return the same correctly initialized INSTANCE without incurring any additional synchronization overhead.

Cost of using final fields

We know that making fields final is usually a good idea as we gain thread-safety and immutability which makes the code easier to reason about. I'm curious if there's an associated performance cost.
The Java Memory Model guarantees this final Field Semantics:
A thread that can only see a reference to an object after that object has been completely initialized is guaranteed to see the correctly initialized values for that object's final fields.
This means that for a class like this
class X {
X(int a) {
this.a = a;
}
final int a;
static X instance;
}
whenever Thread 1 creates an instance like this
X.instance = new X(43);
while (true) doSomethingEventuallyEvictingCache();
and Thread 2 sees it
while (X.instance == null) {
doSomethingEventuallyEvictingCache();
}
System.out.println(X.instance.a);
it must print 43. Without the final modifier, the JIT or the CPU could reorder the stores (first store X.instance and then set a=43) and Thread 2 could see the default-initialized value and print 0 instead.
When JIT sees final it obviously refrains from reordering. But it also has to force the CPU to obey the order. Is there an associated performance penalty?
Is there an associated performance penalty?
If you take a look at the source code of the JIT compiler, you will find the following comment regarding final member variables in the file src/share/vm/opto/parse1.cpp:
This method (which must be a constructor by the rules of Java) wrote a final. The effects of all initializations must be committed to memory before any code after the constructor publishes the reference to the newly constructor object. Rather than wait for the publication, we simply block the writes here. Rather than put a barrier on only those writes which are required to complete, we force all writes to complete.
The compiler emits additional instructions if there are final member variables. Most likely, these additional instructions cause a performance penalty. But it's unclear, if this impact is significant for any application.

java: why should this not be allowed to escape the constructor?

In JCIP, section 3.2.1 "Safe Constructor Practices", there is a warning against leaking this to another thread from the constructor, "even if the publication is the last statement in the constructor." That last part seems too strong to me, and it's provided with no justification. What is it that happens after construction that I must be so careful to avoid? Are there exceptions? I'm interested because I recently submitted some code in which I did this very thing, and I want to decide whether there is justification to go back through and refactor.
As far as Java memory model is concerned, the constructor exit plays a role in final field semantics, therefore there's a difference whether a statement is before or after constructor exit.
This works This doesn't work
-------------------------------------------------------------
static Foo shared; static Foo shared;
class Foo class Foo
{ {
final int i; final int i;
Foo() Foo()
{ {
i = 1; i = 1;
shared = this;
} }
} }
shared = new Foo(); new Foo();
(Note: shared is not volatile; the publication is through data race.)
The only difference between the 2 examples is assigning shared before or after constructor exit. In the second example, i=1 is allowed to be reordered after the assignment.
However, if the publication is a synchronized action, e.g. through a volatile variable, then it's ok; other threads will observe a fully initialized object; the fields don't even have to final.
Publication through data race (or doing anything through data race) is a very tricky business that requires very careful reasoning. If you avoid data race, things are much simpler. If your code contains no data race, there's no difference between leaking this immediately before constructor exit, and publishing it immediately after constructor exit.
You should never leak this from a constructor at any point, "even [...] in the last statement." Since this isn't fully constructed some very odd things can happen. See this SO answer on a very similar question.
You should never pass this out of the constructor (known as "leaking this")
One reason you shouldn't do this, even if it's the last line of the constructor, is that the JVM is allowed to re-order statements as long as the effect on the current thread isn't affected. If this is passed to a process running in another thread, reordering can cause weird and subtle bugs.
Another reason is that subclasses may provide their own initialization, so construction may not be complete at the last line of your class' constructor.

Java reusing (static?) objects as temporary objects for performance

I need to call methods of a class with multiple methods very often in a simulation loop.
Some of these methods need to access temporary objects for storing information in them. After leaving the method the stored data is not needed anymore.
For example:
Class class {
method1() {
...
SomeObject temp = new SomeObject();
...
}
method2() {
...
SomeObject temp = new SomeObject();
SomeObject temp2 = new SomeObject();
...
}
}
I need to optimize as much as possible. The most expensive (removable) problem is that too many allocations happen.
I assume it would be better not to allocate the space needed for those objects every time so I want to keep them.
Would it be more efficient to store them in a static way or not?
Like:
Class class {
private (static?) SomeObject temp;
private (static?) SomeObject temp2;
methods...
}
Or is there even a better way? Thank you for your help!
Edit based on answers:
Not the memory footprint is the actual problem but the garbage collection cleaning up the mess.
SomeObject is a Point2D-like class, nothing memory expensive (in my opinion).
I am not sure whether it is better to use (eventually static) class level objects as placeholder or some more advanced method which I am not aware of.
I would be wary in this example of pre-mature optimization. There are downsides, typically, that it makes the code more complex (and complexity makes bugs more likely), harder to read, could introduce bugs, may not offer the speedup you expected, etc. For a simple object such as representing a 2D point coordinate, I wouldn't worry about re-use. Typically re-use gains the most benefit if you are either working with a large amount of memory, avoid lengthy expensive constructors, or are pulling object construction out of a tight loop that is frequently executed.
Some different strategies you could try:
Push responsiblity to caller One way would be to to have the caller pass in an object pre-initialized, making the method parameter final. However, whether this will work depends on what you need to do with the object.
Pointer to temporary object as method parameter Another way would be to have the caller pass as an object as a parameter that's purpose is essentially to be a pointer to an object where the method should do its temporary storage. I think this technique is more commonly used in C++, but works similarly, though sometimes shows up in places like graphics programming.
Object Pool One common way to reuse temporary objects is to use an object pool where objects are allocated from a fixed bank of "available" objects. This has some overhead, but if the objects are large, and frequently used for only short periods of time, such that memory fragmentation might be a concern, the overhead may be enough less to be worth considering.
Member Variable If you are not concerned about concurrent calls to the method (or have used synchronization to prevent such), you could emulate the C++ism of a "local static" variable, by creating a member variable of the class for your storage. It makes the code less readable and slightly more room to introduce accidental interference with other parts of your code using the variable, but lower overhead than an object pool, and does not require changes to your method signature. If you do this, you may optionally also wish to use the transient keyword on the variable as well to indicate the variable does not need to be serialized.
I would shy away from a static variable for the temporary unless the method is also static, because this may have a memory overhead for the entire time your program runs that is undesirable, and the same downsides as a member variable for this purpose x2 (multiple instances of the same class)
Keep in mind that temp and temp2 are not themselves objects, but variables pointing to an object of type SomeObject. The way you are planning to do it, the only difference would be that temp and temp2 would be instance variables instead of local variables. Calling
temp = new SomeObject();
Would still allocate a new SomeObject onto the heap.
Additionally, making them static or instance variables instead of local would cause the last assigned SomeObjects to be kept strongly reachable (as long as your class instance is in scope for instance variables), preventing them from being garbage collected until the variables are reassigned.
Optimizing in this way probably isn't effective. Currently, once temp and temp2 are out of scope, the SomeObjects they point to will be eligible for garbage collection.
If you're still interested in memory optimization, you will need to show what the SomeObject is in order to get advice as to how you could cache the information it's holding.
How large are these objects. It seems to me that you could have class level objects (not necessarily static. I'll come back to that). For SomeObject, you could have a method that purges its contents. When you are done using it in one place, call the method to purge its contents.
As far as static, will multiple callers use this class and have different values? If so, don't use static.
First, you need to make sure that you are really have this problem. The benefit of a Garbage Collector is that it takes care of all temporary objects automatically.
Anyways, suppose you run a single threaded application and you use at most MAX_OBJECTS at any giving time. One solution could be like this:
public class ObjectPool {
private final int MAX_OBJECTS = 5;
private final Object [] pool = new Object [MAX_OBJECTS];
private int position = 0;
public Object getObject() {
// advance to the next object
position = (position + 1) % MAX_OBJECTS;
// check and create new object if needed
if(pool[position] == null) {
pool[position] = new Object();
}
// return next object
return pool[position];
}
// make it a singleton
private ObjectPool() {}
private static final ObjectPool instance = new ObjectPool();
public static ObjectPool getInstance() { return instance;}
}
And here is the usage example:
public class ObjectPoolTest {
public static void main(String[] args) {
for(int n = 0; n < 6; n++) {
Object o = ObjectPool.getInstance().getObject();
System.out.println(o.hashCode());
}
}
}
Here is the output:
0) 1660364311
1) 1340465859
2) 2106235183
3) 374283533
4) 603737068
5) 1660364311
You can notice that the first and the last numbers are the same - the MAX_OBJECTS + 1 iterations returns the same temporary object.

Categories

Resources