Java reusing (static?) objects as temporary objects for performance

Java reusing (static?) objects as temporary objects for performance - java

I need to call methods of a class with multiple methods very often in a simulation loop.
Some of these methods need to access temporary objects for storing information in them. After leaving the method the stored data is not needed anymore.
For example:
Class class {
method1() {
...
SomeObject temp = new SomeObject();
...
}
method2() {
...
SomeObject temp = new SomeObject();
SomeObject temp2 = new SomeObject();
...
}
}
I need to optimize as much as possible. The most expensive (removable) problem is that too many allocations happen.
I assume it would be better not to allocate the space needed for those objects every time so I want to keep them.
Would it be more efficient to store them in a static way or not?
Like:
Class class {
private (static?) SomeObject temp;
private (static?) SomeObject temp2;
methods...
}
Or is there even a better way? Thank you for your help!
Edit based on answers:
Not the memory footprint is the actual problem but the garbage collection cleaning up the mess.
SomeObject is a Point2D-like class, nothing memory expensive (in my opinion).
I am not sure whether it is better to use (eventually static) class level objects as placeholder or some more advanced method which I am not aware of.

I would be wary in this example of pre-mature optimization. There are downsides, typically, that it makes the code more complex (and complexity makes bugs more likely), harder to read, could introduce bugs, may not offer the speedup you expected, etc. For a simple object such as representing a 2D point coordinate, I wouldn't worry about re-use. Typically re-use gains the most benefit if you are either working with a large amount of memory, avoid lengthy expensive constructors, or are pulling object construction out of a tight loop that is frequently executed.
Some different strategies you could try:
Push responsiblity to caller One way would be to to have the caller pass in an object pre-initialized, making the method parameter final. However, whether this will work depends on what you need to do with the object.
Pointer to temporary object as method parameter Another way would be to have the caller pass as an object as a parameter that's purpose is essentially to be a pointer to an object where the method should do its temporary storage. I think this technique is more commonly used in C++, but works similarly, though sometimes shows up in places like graphics programming.
Object Pool One common way to reuse temporary objects is to use an object pool where objects are allocated from a fixed bank of "available" objects. This has some overhead, but if the objects are large, and frequently used for only short periods of time, such that memory fragmentation might be a concern, the overhead may be enough less to be worth considering.
Member Variable If you are not concerned about concurrent calls to the method (or have used synchronization to prevent such), you could emulate the C++ism of a "local static" variable, by creating a member variable of the class for your storage. It makes the code less readable and slightly more room to introduce accidental interference with other parts of your code using the variable, but lower overhead than an object pool, and does not require changes to your method signature. If you do this, you may optionally also wish to use the transient keyword on the variable as well to indicate the variable does not need to be serialized.
I would shy away from a static variable for the temporary unless the method is also static, because this may have a memory overhead for the entire time your program runs that is undesirable, and the same downsides as a member variable for this purpose x2 (multiple instances of the same class)

Keep in mind that temp and temp2 are not themselves objects, but variables pointing to an object of type SomeObject. The way you are planning to do it, the only difference would be that temp and temp2 would be instance variables instead of local variables. Calling
temp = new SomeObject();
Would still allocate a new SomeObject onto the heap.
Additionally, making them static or instance variables instead of local would cause the last assigned SomeObjects to be kept strongly reachable (as long as your class instance is in scope for instance variables), preventing them from being garbage collected until the variables are reassigned.
Optimizing in this way probably isn't effective. Currently, once temp and temp2 are out of scope, the SomeObjects they point to will be eligible for garbage collection.
If you're still interested in memory optimization, you will need to show what the SomeObject is in order to get advice as to how you could cache the information it's holding.

How large are these objects. It seems to me that you could have class level objects (not necessarily static. I'll come back to that). For SomeObject, you could have a method that purges its contents. When you are done using it in one place, call the method to purge its contents.
As far as static, will multiple callers use this class and have different values? If so, don't use static.

First, you need to make sure that you are really have this problem. The benefit of a Garbage Collector is that it takes care of all temporary objects automatically.
Anyways, suppose you run a single threaded application and you use at most MAX_OBJECTS at any giving time. One solution could be like this:
public class ObjectPool {
private final int MAX_OBJECTS = 5;
private final Object [] pool = new Object [MAX_OBJECTS];
private int position = 0;
public Object getObject() {
// advance to the next object
position = (position + 1) % MAX_OBJECTS;
// check and create new object if needed
if(pool[position] == null) {
pool[position] = new Object();
}
// return next object
return pool[position];
}
// make it a singleton
private ObjectPool() {}
private static final ObjectPool instance = new ObjectPool();
public static ObjectPool getInstance() { return instance;}
}
And here is the usage example:
public class ObjectPoolTest {
public static void main(String[] args) {
for(int n = 0; n < 6; n++) {
Object o = ObjectPool.getInstance().getObject();
System.out.println(o.hashCode());
}
}
}
Here is the output:
0) 1660364311
1) 1340465859
2) 2106235183
3) 374283533
4) 603737068
5) 1660364311
You can notice that the first and the last numbers are the same - the MAX_OBJECTS + 1 iterations returns the same temporary object.

Related

Why does a non-static field not act as a GC root?

As I know static fields (along with Threads, local variables and method arguments, JNI references) act as GC roots.
I cannot provide a link that would confirm this, but I have read a lot of articles on it.
Why can't a non-static field act as a GC root?

First off, we need to be sure we're on the same page as to what a tracing garbage collection algorithm does in its mark phase.
At any given moment, a tracing GC has a number of objects that are known to be alive, in the sense that they are reachable by the running program as it stands right now. The main step of mark phrase involves following the non-static fields of those objects to find more objects, and those new objects will now also be known to be alive. This step is repeated recursively until no new alive objects are found by traversing the existing live objects. All objects in memory not proved live are considered dead. (The GC then moves to the next phase, which is called the sweep phase. We don't care about that phase for this answer.)
Now this alone is not enough to execute the algorithm. In the beginning, the algorithm has no objects that it knows to be alive, so it can't start following anyone's non-static fields. We need to specify a set of objects that are considered known to be alive from the start. We choose those objects axiomatically, in the sense that they don't come from a previous step of the algorithm -- they come from outside. Specifically, they come from the semantics of the language. Those objects are called roots.
In a language like Java, there are two sets of objects that are definite GC roots. Anything that is accessible by a local variable that's still in scope is obviously reachable (within its method, which still hasn't returned), therefore it's alive, therefore it's a root. Anything that is accessible through a static field of a class is also obviously reachable (from anywhere), therefore it's alive, therefore it's a root.
But if non-static fields were considered roots as well, what would happen?
Say you instantiate an ArrayList<E>. Inside, that object has a non-static field that points to an Object[] (the backing array that represents the storage of the list). At some point, a GC cycle starts. In the mark phase, the Object[] is marked as alive because it is pointed to by the ArrayList<E> private non-static field. The ArrayList<E> is not pointed to by anything, so it fails to be considered alive. Thus, in this cycle, the ArrayList<E> is destroyed while the backing Object[] survives. Of course, at the next cycle, the Object[] also dies, because it is not reachable by any root. But why do this in two cycles? If the ArrayList<E> was dead in the first cycle and if Object[] is used only by a dead object, shouldn't the Object[] also be considered dead in the same move, to save time and space?
That's the point here. If we want to be maximally efficient (in the context of a tracing GC), we need to get rid of as many dead objects as possible in a single GC.
To do that, a non-static field should keep an object alive only if the enclosing object (the object that contains the field) has been proved to be alive. By contrast, roots are objects we call alive axiomatically (without proof) in order to kick-start the algorithm's marking phase. It is in our best interest to limit the latter category to the bare minimum that doesn't break the running program.
For example, say you have this code:
class Foo {
Bar bar = new Bar();
public static void main(String[] args) {
Foo foo = new Foo();
System.gc();
}
public void test() {
Integer a = 1;
bar.counter++; //access to the non-static field
}
}
class Bar {
int counter = 0;
}
When the garbage collection starts, we get one root that's the local variable Foo foo. That's it, that's our only root.
We follow the root to find the instance of Foo, which is marked as alive and then we attempt to find its non-static fields. We find one of them, the Bar bar field.
We follow the fields to find the instance of Bar, which is marked as alive and then we attempt to find its non-static fields. We find that it contains no more fields that are reference types, so the GC doesn't need to bother for that object anymore.
Since we can't get find new alive objects in this round of recursion, the mark phase can end.
Alternatively:
class Foo {
Bar bar = new Bar();
public static void main(String[] args) {
Foo foo = new Foo();
foo.test();
}
public void test() {
Integer a = 1;
bar.counter++; //access to the non-static field
System.gc();
}
}
class Bar {
int counter = 0;
}
When the garbage collection starts, the local variable Integer a is a root and the Foo this reference (the implicit reference that all non-static methods get) is also a root. The local variable Foo foo from main is also a root because main hasn't gone out of scope yet.
We follow the root to find the instance of Integer and instance of Foo (we find one of these objects twice, but this doesn't matter for the algorithm), which are marked as alive and then we attempt to follow their non-static fields. Let's say the instance of Integer has no more fields to class instances. The instance of Foo gives us one Bar field.
We follow the field to find the instance of Bar, which is marked as alive and then we attempt to find its non-static fields. We find that it contains no more fields that are reference types, so the GC doesn't need to bother for that object anymore.
Since we can't get find new alive objects in this round of recursion, the mark phase can end.

A non static field has a reference held by the instance that contains it, so it cannot be a GC root on its own right.

Usefulness of immutable objects when the state of a program constantly changes

I know that immutable objects always have the same state, the state in which they are actually created. Their invariants are establised by the constructor and since their state does not change after construction , those invariants always hold good and this is why they are safe to publish in a multi threaded environment. This is all fine but since we live in a dynamic world , where the state of program changes continuously , what benefits do such objects give us if we construct the state of our program through immutable objects?

"what benefits do such objects give us" you already answered that.
Regarding the "dynamic" part of your question, if you need to "change" an immutable object, you can create a new one from the old one:
Immutable oldObj = new Immutable(...);
Immutable newObj = new Immutable(oldObj.property1, "a new value for property 2");
If you find that you keep doing that repeatedly, then maybe you need to make the object mutable and add the relevant tread-safety features that are needed to be able to use that object in a concurrent environment.

Immutable objects allow you to cleanly communicate changes of state to different threads.
It is a good practice to use immutable objects to represent messages exchanged between threads. Once such message is sent, its payload can not be altered, which prevents many concurrency related bugs. If a thread needs to communicate some further changes, it just sends next messages.

Immutable objects are very helpful when you need some static object whose state never changes .Greatest advantage are immutability , object semantics and smart pointers renders object ownership a moot point. Implicitly this also means deterministic behaviour in the presence of concurrency.
Java has already defined some Immutable classes like String Integer.
Other benefit is they always have "failure atomicity" (a term used by Joshua Bloch) : if an immutable object throws an exception, it's never left in an undesirable or indeterminate state .
Let say if you have a global cache of static objects like country codes , here you can apply Immutability.
Why do we need immutable class?

Immutable objects are really useful in cases like this, with a String object:
public class A {
private volatile String currentName = "The First Name";
public String getCurrentName() {
// Fast: no synching or blocking! Can be called billions of times by
// billions of threads with no trouble.
// (Does need to be read from memory always because it's volatile.)
return currentName;
}
public whatever someMethod() {
... code ...
// Simple assignment in this case. Could involve synchronization
// and lots of calculations, but it's called a lot less than
// getCurrentName().
currentName = newName;
... code ...
}
}
public class B {
... in some method ...
A objA = something;
// Gets "name" fast despite a billion other threads doing the same thing.
String name = objA.getCurrentName();
// From this point on, String referenced by "name" won't change
// regardless of how many times A.currentName changes.
... code with frequent references to objA
}
This allows complex data (or even simple data, this case) that must be consistent (if not precisely up-to-date) to be updated and delivered to anybody who wants it very quickly and in a thread-safe manner. The data delivered will soon become outdated, perhaps, but it will keep its value during the calling method and remain consistent.

Can I override object with sun.misc.Unsafe?

Can I override one obejct with another if they are instance of same class, their size is the same, using sun.misc.Unsafe?
edit:
By "override" I mean to "delete" first object, ant to fill the memory with the second one. Is it possible?

By "override" I mean to "delete" first object, ant to fill the memory
with the second one. Is it possible?
Yes and no.
Yes - If you allocate some memory with Unsafe and write a long, then write another long in it (for example), then yes, you have deleted the first object and filled the memory with a second object. This is similar to what you can do with ByteBuffer. Of course, long is a primitive type, so it is probably not what you mean by 'object'.
Java allows this, because it has the control on allocated memory.
No - Java works with references to objects and only provides references to these objects. Moreover, it tends to move objects around in memory (i.e., for garbage collection).
There is no way to get the 'physical address' and move memory content from one object address to another, if that's what you are trying. Moreover, you can't actually 'delete' the object, because it may be referenced from elsewhere in the code.
However, there is always the possibility of having reference A point to another objectB instead of objectA with A = objectB; You can even make this atomic with Unsafe.compareAndSwapObject(...).
Workaround - Now, let's imagine that reference A1, A2, A3 point to the same objectA. If you want all of them to suddently point to objectB, you can't use Unsafe.compareAndSwapObject(...), because only A1 would point to objectB, while A2 and A3 would still point to objectA. It would not be atomic.
There is a workaround:
public class AtomicReferenceChange {
public static Object myReference = new Object();
public static void changeObject(Object newObject) {
myReference = newObject;
}
public static void main(String[] args) {
System.out.println(AtomicReferenceChange.myReference);
AtomicReferenceChange.changeObject("333");
System.out.println(AtomicReferenceChange.myReference);
}
}
Instead of having multiple references to the same object, you could define a public static reference and have your code use AtomicReferenceChange.myReference everywhere. If you want to change the referenced object atomically, use the static method changeObject(...).

Does variable = null set it for garbage collection

Help me settle a dispute with a coworker:
Does setting a variable or collection to null in Java aid in garbage collection and reducing memory usage? If I have a long running program and each function may be iteratively called (potentially thousands of times): Does setting all the variables in it to null before returning a value to the parent function help reduce heap size/memory usage?

That's old performance lore. It was true back in 1.0 days, but the compiler and the JVM have been improved to eliminate the need (if ever there was one). This excellent IBM article gets into the details if you're interested: Java theory and practice: Garbage collection and performance

From the article:
There is one case where the use of explicit nulling is not only helpful, but virtually required, and that is where a reference to an object is scoped more broadly than it is used or considered valid by the program's specification. This includes cases such as using a static or instance field to store a reference to a temporary buffer, rather than a local variable, or using an array to store references that may remain reachable by the runtime but not by the implied semantics of the program.
Translation: "explicitly null" persistent objects that are no longer needed. (If you want. "Virtually required" too strong a statement?)

The Java VM Spec
12.6.1 Implementing Finalization
Every object can be characterized by two attributes: it may be reachable, finalizer-reachable, or unreachable, and it may also be unfinalized, finalizable, or finalized.
A reachable object is any object that can be accessed in any potential continuing computation from any live thread. Optimizing transformations of a program can be designed that reduce the number of objects that are reachable to be less than those which would naively be considered reachable. For example, a compiler or code generator may choose to set a variable or parameter that will no longer be used to null to cause the storage for such an object to be potentially reclaimable sooner.
Discussion
Another example of this occurs if the values in an object's fields are stored in registers. The program may then access the registers instead of the object, and never access the object again. This would imply that the object is garbage.
The object is reachable if it can be involved in any potential continuing computation. So if your code refers to a local variable, and nothing else refers to it, then you might cause the object to be collected by setting it to null. This would either give a null pointer exception, or change the behaviour of your program, or if it does neither you didn't need the variable in the first place.
If you are nulling out a field or an array element, then that can possibly make sense for some applications, and it will cause the memory to be reclaimed faster. Once case is creating a large array to replace an existing array referenced by a field in a class - if the field in nulled before the replacement is created, then it may relieve pressure on the memory.
Another interesting feature of Java is that scope doesn't appear in class files, so scope is not relevant to reachability; these two methods create the same bytecode, and hence the VM does not see the scope of the created object at all:
static void withBlock () {
int x = 1;
{
Object a = new Object();
}
System.out.println(x+1);
}
static void withoutBlock () {
int x = 1;
Object a = new Object();
System.out.println(x+1);
}

Not necessarily. An object becomes eligible for garbage collection when there are no live threads anymore that hold a reference to the object.
Local variables go out of scope when the method returns and it makes no sense at all to set local variables to null - the variables disappear anyway, and if there's nothing else that holds a reference the objects that the variables referred to, then those objects become eligible for garbage collection.
The key is not to look at just variables, but look at the objects that those variables refer to, and find out where those objects are referenced by your program.

It is useless on local variables, but it can be useful/needed to clear up instance variables that are not required anymore (e.g. post-initialization).
(Yeah yeah, I know how to apply the Builder pattern...)

That could only make some sense in some scenario like this:
public void myHeavyMethod() {
List hugeList = loadHugeListOfStuff(); // lots of memory used
ResultX res = processHugeList(hugeList); // compute some result or summary
// hugeList = null; // we are done with hugeList
...
// do a lot of other things that takes a LOT of time (seconds?)
// and which do not require hugeList
...
}
Here it could make some benefit to uncomment the hugeList = null line, I guess.
But it would certainly make more sense to rewrite the method (perhaps refactoring into two,
or specifying an inner scope).

Setting an object reference to null only makes it eligible for garbage collection.
It does not necessarily free up the memory,which depends on when the garbage collector runs(which depends on JVM).
When the garbage collector runs,it frees up the heap by deleting only the objects which are eligible for garbage collection.

It is a good to have. When you set objects to null, there is a possibility that the object can be garbage collected faster, in the immediate GC cycle. But there is no guaranteed mechanism to make an object garbage collected at a given time.

Choosing when to instantiate classes

I recently wrote a class for an assignment in which I had to store names in an ArrayList (in java). I initialized the ArrayList as an instance variable private ArrayList<String> names. Later when I checked my work against the solution, I noticed that they had initialized their ArrayList in the run() method instead.
I thought about this for a bit and I kind of feel it might be a matter of taste, but in general how does one choose in situations like this? Does one take up less memory or something?
PS I like the instance variables in Ruby that start with an # symbol: they are lovelier.
(meta-question: What would be a better title for this question?)

In the words of the great Knuth "Premature optimization is the root of all evil".
Just worry that your program functions correctly and that it does not have bugs. This is far more important than an obscure optimization that will be hard to debug later on.
But to answer your question - if you initialize in the class member, the memory will be allocated the first time a mention of your class is done in the code (i.e. when you call a method from it). If you initialize in a method, the memory allocation occurs later, when you call this specific method.
So it is only a question of initializing later... this is called lazy initialization in the industry.

Initialization
As a rule of thumb, try to initialize variables when they are declared.
If the value of a variable is intended never to change, make that explicit with use of the final keyword. This helps you reason about the correctness of your code, and while I'm not aware of compiler or JVM optimizations that recognize the final keyword, they would certainly be possible.
Of course, there are exceptions to this rule. For example, a variable may by be assigned in an if–else or a switch. In a case like that, a "blank" declaration (one with no initialization) is preferable to an initialization that is guaranteed to be overwritten before the dummy value is read.
/* DON'T DO THIS! */
Color color = null;
switch(colorCode) {
case RED: color = new Color("crimson"); break;
case GREEN: color = new Color("lime"); break;
case BLUE: color = new Color("azure"); break;
}
color.fill(widget);
Now you have a NullPointerException if an unrecognized color code is presented. It would be better not to assign the meaningless null. The compiler would produce an error at the color.fill() call, because it would detect that you might not have initialized color.
In order to answer your question in this case, I'd have to see the code in question. If the solution initialized it inside the run() method, it must have been used either as temporary storage, or as a way to "return" the results of the task.
If the collection is used as temporary storage, and isn't accessible outside of the method, it should be declared as a local variable, not an instance variable, and most likely, should be initialized where it's declared in the method.
Concurrency Issues
For a beginning programming course, your instructor probably wasn't trying to confront you with the complexities of concurrent programming—although if that's the case, I'm not sure why you were using a Thread. But, with current trends in CPU design, anyone who is learning to program needs to have a firm grasp on concurrency. I'll try to delve a little deeper here.
Returning results from a thread's run method is a bit tricky. This method is the Runnable interface, and there's nothing stopping multiple threads from executing the run method of a single instance. The resulting concurrency issues are part of the motivation behind the Callable interface introduced in Java 5. It's much like Runnable, but can return a result in a thread-safe manner, and throw an Exception if the task can't be executed.
It's a bit of a digression, but if you are curious, consider the following example:
class Oops extends Thread { /* Note that thread implements "Runnable" */
private int counter = 0;
private Collection<Integer> state = ...;
public void run() {
state.add(counter);
counter++;
}
public static void main(String... argv) throws Exception {
Oops oops = new Oops();
oops.start();
Thread t2 = new Thread(oops); /* Now pass the same Runnable to a new Thread. */
t2.start(); /* Execute the "run" method of the same instance again. */
...
}
}
By the end of the the main method you pretty much have no idea what the "state" of the Collection is. Two threads are working on it concurrently, and we haven't specified whether the collection is safe for concurrent use. If we initialize it inside the thread, at least we can say that eventually, state will contain one element, but we can't say whether it's 0 or 1.

From wikibooks:
There are three basic kinds of scope for variables in Java:
local variable, declared within a method in a class, valid for (and occupying storage only for) the time that method is executing. Every time the method is called, a new copy of the variable is used.
instance variable, declared within a class but outside any method. It is valid for and occupies storage for as long as the corresponding object is in memory; a program can instantiate multiple objects of the class, and each one gets its own copy of all instance variables. This is the basic data structure rule of Object-Oriented programming; classes are defined to hold data specific to a "class of objects" in a given system, and each instance holds its own data.
static variable, declared within a class as static, outside any method. There is only one copy of such a variable no matter how many objects are instantiated from that class.
So yes, memory consumption is an issue, especially if the ArrayList inside run() is local.

I am not completely I understand your complete problem.
But as far as I understand it right now, the performance/memory benefit will be rather minor. Therefore I would definitely favour the easibility side.
So do what suits you the best. Only address performance/memory optimisation when needed.

My personal rule of thumb for instance variables is to initialize them, at least with a default value, either:
at delcaration time, i.e.
private ArrayList<String> myStrings = new ArrayList<String>();
in the constructor
If it's something that really is an instance variable, and represents state of the object, it is then completely initialized by the time the constructor exits. Otherwise, you open yourself to the possibility of trying to access the variable before it has a value. Of course, that doesn't apply to primitives where you will get a default value automatically.
For static (class-level) variables, initialize them in the declaration or in a static initializer. I use a static initializer if I have do calculations or other work to get a value. Initialize in the declaration if you're just calling new Foo() or setting the variable to a known value.

You have to avoid Lazy initialization. It leads to problems later.
But if you have to do it because the initialization is too heavy you have to do it like this:
Static fields:
// Lazy initialization holder class idiom for static fields
private static class FieldHolder {
static final FieldType field = computeFieldValue();
}
static FieldType getField() { return FieldHolder.field; }
Instance fields:
// Double-check idiom for lazy initialization of instance fields
private volatile FieldType field;
FieldType getField() {
FieldType result = field;
if (result == null) { // First check (no locking)
synchronized(this) {
result = field;
if (result == null) // Second check (with locking)
field = result = computeFieldValue();
}
}
return result;
}
Acording to Joshua Bolch book's "Effective Java™
Second Edition" (ISBN-13: 978-0-321-35668-0):
"Use lazy initialization judiciously"

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.