Comparator as an anonymous sorter in a stream - java

Let's say I am trying to sort a collection with a specific Comparator. Does it matter from a performance point of view to have a comparator defined in the sorted() clause of as a an anonumous instance, or it is better to create an instance once and just call compare method in the sorted() clause?
In essence, what is better:
myCollection.stream().sorted(
new Comparator<String>(){
public int compare(String a, String b){
//code
}
})
Comparator<String> comp = new MyCustomComparator<>();
myCollection.stream().sorted(comp::compare)
Note: neither syntax, nor comparing values matter - I want to know conceptually whether JVM is smart enough to initialize my anonymous comparator only once (case 1) and keep reusing just one method, or it will keep creating new instances (then I would choose case 2)

A new instance of an anonymous class will be created every time the expression using new is evaluated.
In your first example, a new one is created every time the statement runs where you are passing it to sorted.
In your second example, a new one is created wherever the comp variable is being initialized. If comp is an instance member, then it gets created whenever the object that owns it is created. If comp is a local variable in a method, then it gets created every time the method is called.
A static, stateless and non-capturing Comparator is always going to be the most efficient way, because you can create it once and keep it forever. (See for example String.CASE_INSENSITIVE_ORDER.)
That's not to say you shouldn't use another way.
In Java 8, you should prefer lambdas over anonymous classes. Non-capturing lambdas can be cached and only created once. For example, this program outputs true:
class Example {
public static void main(String[] args) {
System.out.println(comparator() == comparator());
}
static Comparator<String> comparator() {
return (lhs, rhs) -> lhs.compareTo(rhs);
}
}
(Example on Ideone.)
All that said, you shouldn't worry about creating a few small objects here in there in Java, because it's unavoidable and the garbage collector is optimized for it. The vast majority of the time, the "best" way to do something is also the most readable.
Note that you do not have to use a method reference in your second example. You can pass it to the method directly:
Comparator<String> comp = new MyCustomComparator<>();
myCollection.stream().sorted(comp)...

Runtime for both approach will be the same. This can be expressed as below :
For 1st scenario, first JVM create instance of comparator with your custom code for compare method and allocated space for this object anonymously. So ultimately object is created and allocated memory without some pointing reference for user and once function call is over object is registered for GC.
For 2nd scenario, JVM again created new instance of comparator with custom code and allocated space and also provide reference stored in separate variable so that this object can be used again but here object won't be collected by GC if same is used again in code anywhere else. So when GC runs for next time, it has to scan for references of variable and figure out whether it can be GCed or not.

Related

Using new keyword without defining variable in java

The following code is used to create an object of a class within a variable:
ClassName obj = new ClassName();
But what if we do not create any variable and just type the following?
new ClassName();
When I used it, there was no error. But what it actually does when no variable is created?
Both methods create an object, but the second method has no name for it.
When you create an instance of the ClassName, the difference between your first method and second method is that for your first method, you can actually access variables and methods within the class.
For example,
ClassName obj = new ClassName();
System.out.println(obj.x);
Where ClassName
class ClassName {
public static int x = 2;
}
The benefit of this approach is you can access the variables within the class. If you used the second approach, then the instance would have no name, so you wouldn't be able to access the variables.
new ClassName() calls a special method called a constructor and, like any method, it returns a value1. The value that it returns is an instance of ClassName. You can choose to assign that value to a variable, or not, just as you can do with the return value of any method. So you simply need to decide whether you need to assign the returned value to a variable, or not. If you do, then you need the first statement, namely:
ClassName obj = new ClassName();
And if you don't need to assign the returned value to a variable, then you can use:
new ClassName();
1 - a method can return void which means it doesn't actually return a value
To answer to your question I think is useful to see how objects are managed by JVM during execution.
All java objects reside in a JVM area called heap and can ( note that this is not mandatory ) be pointed by one or more variables.
The variables that hold references to objects reside on another area of JVM called stack.
Every time an instruction like new SomeClass(); is executed a new object is allocated on that area. When the heap area becomes full, garbage is collected and during the garbage collection objects that are no longer pointed by a variable are cleared in order to free space on heap for new objects.
What is the difference between your two instructions?
1- ClassName obj = new ClassName();
It allocates a new object on heap that is pointed by the obj variable on stack.
2- new ClassName();
It just allocates a new object on heap with no variables on stack that point to it.
When can be useful to use the second approach?
According to this, using the second approach could be useful if you want just to allocate a new object on heap without using it through a variable for example to test memory management on your program. Actually the second approach is used when you want to create a "one-shot" object and use a method or a variable of it in the same instruction improving the garbage collector efficiency and avoiding unexpectedly get an OutOfMemoryError exception like:
using once time an object variable:
System.out.println(new ClassName().x)
or
executing once time an object method:
new ClassName().evaluateResult();
NOTE: In this specific case you probably could evaluate instead to make a static method on the class.
Another usage can be to put new elements on a collection as follow:
final List<ClassName> objectList = new ArrayList<>();
for(int i=0; i<10; i++){
objectList.add(new ClassName());
}
You can may ask: What happens instead to objects that are pointed by
at least one variable like ones created with first approach?
All references that are held in a variable are naturally dropped when the variable goes out of scope so they became eligible for garbage collection automatically ( according how their scope is managed on code ).

Java lambdas heap dump - Instance of lambda not getting garbage collected

I am facing some problems with garbage collection while generating an application in java, where I use Stream.map to trim all the elements in the list. The instances of anonymous lambda class exist in the heap dump even though the instance of the enclosing class is 0 as shown in the snap of visual VM.
The LambdaTesting class:
class LambdaTesting {
protected List<String> values;
protected LambdaTesting(List<String> values) {
this.values = values;
}
public List<String> modify() {
return this.values.stream().map(x -> x.trim()).collect(Collectors.toList());
}
public List<String> modifyLocal() {
List<String> localValue = new ArrayList<>();
localValue.add("Local FOO ");
localValue.add("Local BAR ");
return localValue.stream().map(x -> x.trim()).collect(Collectors.toList());
}
}
The method which creates the instance of LambdaTesting and invokes these methods:
public List<String> testMethods() {
List<String> test = new ArrayList<>();
test.add("Global FOO ");
test.add(" GLOBAL BAR");
LambdaTesting lambdaTesting = new LambdaTesting(test);
lambdaTesting.modifyLocal();
lambdaTesting.modify();
}
The thread dump was taken after putting a debug point at the next line after testMethods is invoked.
Why are the references to Lambda still present in the heap dump?
As elaborated in Does a lambda expression create an object on the heap every time it's executed?, a non-capturing lambda expression will be remembered and reused, which implies that it is permanently associated with the code that created it. That’s not different to, e.g. string literals whose object representation stays in memory as long as the code containing the literal is alive.
This is an implementation detail. It doesn’t have to be that way, but the reference implementation and hence, all commonly used JREs do it that way.
A non-capturing lambda expression is a lambda expression that uses no (non-constant) variables of the surrounding context and does not use this, neither implicitly nor explicitly. So it bears no state, hence, consumes a tiny amount of memory. There is also no possibility to create a leak regarding other objects, as having references to other objects is what makes the difference between non-capturing and capturing lambda expressions and likely is the main reason why capturing lambda expressions are not remembered that way.
So the maximum number of such never-collected instances is equal to the total number of lambda expression in your application, which might be a few hundred or even thousands, but still small compared to the total number of objects the application will ever create. As explained in Function.identity() or t->t, putting a lambda expression into a factory method instead of repeating it in the source code, can reduce the number of instances. But given the rather small total number of objects, that’s rarely a concern. Compare with the number of the already mentioned string literals or the Class objects which already exist in the runtime…

Java reusing (static?) objects as temporary objects for performance

I need to call methods of a class with multiple methods very often in a simulation loop.
Some of these methods need to access temporary objects for storing information in them. After leaving the method the stored data is not needed anymore.
For example:
Class class {
method1() {
...
SomeObject temp = new SomeObject();
...
}
method2() {
...
SomeObject temp = new SomeObject();
SomeObject temp2 = new SomeObject();
...
}
}
I need to optimize as much as possible. The most expensive (removable) problem is that too many allocations happen.
I assume it would be better not to allocate the space needed for those objects every time so I want to keep them.
Would it be more efficient to store them in a static way or not?
Like:
Class class {
private (static?) SomeObject temp;
private (static?) SomeObject temp2;
methods...
}
Or is there even a better way? Thank you for your help!
Edit based on answers:
Not the memory footprint is the actual problem but the garbage collection cleaning up the mess.
SomeObject is a Point2D-like class, nothing memory expensive (in my opinion).
I am not sure whether it is better to use (eventually static) class level objects as placeholder or some more advanced method which I am not aware of.
I would be wary in this example of pre-mature optimization. There are downsides, typically, that it makes the code more complex (and complexity makes bugs more likely), harder to read, could introduce bugs, may not offer the speedup you expected, etc. For a simple object such as representing a 2D point coordinate, I wouldn't worry about re-use. Typically re-use gains the most benefit if you are either working with a large amount of memory, avoid lengthy expensive constructors, or are pulling object construction out of a tight loop that is frequently executed.
Some different strategies you could try:
Push responsiblity to caller One way would be to to have the caller pass in an object pre-initialized, making the method parameter final. However, whether this will work depends on what you need to do with the object.
Pointer to temporary object as method parameter Another way would be to have the caller pass as an object as a parameter that's purpose is essentially to be a pointer to an object where the method should do its temporary storage. I think this technique is more commonly used in C++, but works similarly, though sometimes shows up in places like graphics programming.
Object Pool One common way to reuse temporary objects is to use an object pool where objects are allocated from a fixed bank of "available" objects. This has some overhead, but if the objects are large, and frequently used for only short periods of time, such that memory fragmentation might be a concern, the overhead may be enough less to be worth considering.
Member Variable If you are not concerned about concurrent calls to the method (or have used synchronization to prevent such), you could emulate the C++ism of a "local static" variable, by creating a member variable of the class for your storage. It makes the code less readable and slightly more room to introduce accidental interference with other parts of your code using the variable, but lower overhead than an object pool, and does not require changes to your method signature. If you do this, you may optionally also wish to use the transient keyword on the variable as well to indicate the variable does not need to be serialized.
I would shy away from a static variable for the temporary unless the method is also static, because this may have a memory overhead for the entire time your program runs that is undesirable, and the same downsides as a member variable for this purpose x2 (multiple instances of the same class)
Keep in mind that temp and temp2 are not themselves objects, but variables pointing to an object of type SomeObject. The way you are planning to do it, the only difference would be that temp and temp2 would be instance variables instead of local variables. Calling
temp = new SomeObject();
Would still allocate a new SomeObject onto the heap.
Additionally, making them static or instance variables instead of local would cause the last assigned SomeObjects to be kept strongly reachable (as long as your class instance is in scope for instance variables), preventing them from being garbage collected until the variables are reassigned.
Optimizing in this way probably isn't effective. Currently, once temp and temp2 are out of scope, the SomeObjects they point to will be eligible for garbage collection.
If you're still interested in memory optimization, you will need to show what the SomeObject is in order to get advice as to how you could cache the information it's holding.
How large are these objects. It seems to me that you could have class level objects (not necessarily static. I'll come back to that). For SomeObject, you could have a method that purges its contents. When you are done using it in one place, call the method to purge its contents.
As far as static, will multiple callers use this class and have different values? If so, don't use static.
First, you need to make sure that you are really have this problem. The benefit of a Garbage Collector is that it takes care of all temporary objects automatically.
Anyways, suppose you run a single threaded application and you use at most MAX_OBJECTS at any giving time. One solution could be like this:
public class ObjectPool {
private final int MAX_OBJECTS = 5;
private final Object [] pool = new Object [MAX_OBJECTS];
private int position = 0;
public Object getObject() {
// advance to the next object
position = (position + 1) % MAX_OBJECTS;
// check and create new object if needed
if(pool[position] == null) {
pool[position] = new Object();
}
// return next object
return pool[position];
}
// make it a singleton
private ObjectPool() {}
private static final ObjectPool instance = new ObjectPool();
public static ObjectPool getInstance() { return instance;}
}
And here is the usage example:
public class ObjectPoolTest {
public static void main(String[] args) {
for(int n = 0; n < 6; n++) {
Object o = ObjectPool.getInstance().getObject();
System.out.println(o.hashCode());
}
}
}
Here is the output:
0) 1660364311
1) 1340465859
2) 2106235183
3) 374283533
4) 603737068
5) 1660364311
You can notice that the first and the last numbers are the same - the MAX_OBJECTS + 1 iterations returns the same temporary object.

Guava MapMaker optionally set maximumSize(0) for factory method?

I'm using the MapMaker to implement caching of data objects in my application:
public class DataObjectCache<DO extends MyDataObject> {
private final ConcurrentMap<String, DO> innerCache;
public DataObjectCache(Class<DO> doClass) {
Function<String, DO> loadFunction = new Function<String, DO>() {
#Override
public DO apply(String id) {
//load and return DO instance
}
};
innerCache = new MapMaker()
.softValues()
.makeComputingMap(loadFunction);
}
private DO getDataObject(String id) {
return innerCache.get(id);
}
private void putDataObject(DO dataObject) {
innerCache.putIfAbsent(dataObject.getID(), dataObject);
}
}
One of these DataObjectCaches would be instantiated for each data object class, and they would be kept in a master Map, using the Class objects as keys.
There's a minority of data object classes whose instances I don't want cached. However I would still like them to be instantiated by the same code, which the Function is calling, and would still need concurrency in regard to loading them distinctly.
In these cases, I'm wondering if I can just set the maximum size of the map to 0, so that entries are evicted immediately, but still take advantage of the atomic computing aspects of the map. Is this a good idea? Inefficient?
EDIT:
I realized that if I evicted entries immediately after loading them, there's no way to guarantee they are distinctly loaded - if the Map isn't keeping track of them, multiple instances of an object with the same ID could be floating around the environment. So instead of doing this, I think I'll use weak values instead of soft values for the types of objects I don't want taking up cache - let me know if anyone has an opinion on this.
In light of your edit, it sounds like what you're looking for is an interner. An interner returns a representative instance; the same object will be returned by Interner.intern for all objects that are equal according to your equals method. From the Javadoc:
Chooses and returns the representative
instance for any of a collection of
instances that are equal to each
other. If two equal inputs are given
to this method, both calls will return
the same instance. That is,
intern(a).equals(a) always holds, and
intern(a) == intern(b) if and only if
a.equals(b). Note that intern(a) is
permitted to return one instance now
and a different instance later if the
original interned instance was
garbage-collected.
See http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Interner.html and http://guava-libraries.googlecode.com/svn/trunk/javadoc/com/google/common/collect/Interners.html
That said, it depends what you mean when you say you don't want it cached. If you truly want to return a fresh instance every time, then you'd have to have multiple instances of equivalent objects "floating around".
Interning is holding on to an instance (so it can return the same one), so it is still sort of a cache. I would want to know why you want to avoid caching. If it is because of the size of the objects, you can use a weak interner; the instance will be available for GC when it's no longer referenced. Then again, simply using a MapMaker map with weak values would accomplish that as well.
If, on the other hand, the reason you don't want to cache is because your data is liable to change, interning could be your answer. I would imagine what you'd want is to retrieve the object every time, and then intern it. If the object is equal to the cached one, the interner would simply return the existing instance. If it is different, the interner would cache the new one. Your responsibility then would be to write an equals method on your object that meets the requirements for using a new vs interned instance.
Well, MapMaker.maximumSize has this line: checkArgument(size > 0, "maximum size must be positive"), which will make this impossible. The expireAfter methods also require positive arguments. It's pretty clear that the API designers didn't want you to use their MapMaker-made maps this way.
That said, I suppose, if you really want to use the cache as a passthrough, you could use an expireAfterWrite of 1 nanosecond. It's a hack, but it would practically have the same effect.

Can I override object with sun.misc.Unsafe?

Can I override one obejct with another if they are instance of same class, their size is the same, using sun.misc.Unsafe?
edit:
By "override" I mean to "delete" first object, ant to fill the memory with the second one. Is it possible?
By "override" I mean to "delete" first object, ant to fill the memory
with the second one. Is it possible?
Yes and no.
Yes - If you allocate some memory with Unsafe and write a long, then write another long in it (for example), then yes, you have deleted the first object and filled the memory with a second object. This is similar to what you can do with ByteBuffer. Of course, long is a primitive type, so it is probably not what you mean by 'object'.
Java allows this, because it has the control on allocated memory.
No - Java works with references to objects and only provides references to these objects. Moreover, it tends to move objects around in memory (i.e., for garbage collection).
There is no way to get the 'physical address' and move memory content from one object address to another, if that's what you are trying. Moreover, you can't actually 'delete' the object, because it may be referenced from elsewhere in the code.
However, there is always the possibility of having reference A point to another objectB instead of objectA with A = objectB; You can even make this atomic with Unsafe.compareAndSwapObject(...).
Workaround - Now, let's imagine that reference A1, A2, A3 point to the same objectA. If you want all of them to suddently point to objectB, you can't use Unsafe.compareAndSwapObject(...), because only A1 would point to objectB, while A2 and A3 would still point to objectA. It would not be atomic.
There is a workaround:
public class AtomicReferenceChange {
public static Object myReference = new Object();
public static void changeObject(Object newObject) {
myReference = newObject;
}
public static void main(String[] args) {
System.out.println(AtomicReferenceChange.myReference);
AtomicReferenceChange.changeObject("333");
System.out.println(AtomicReferenceChange.myReference);
}
}
Instead of having multiple references to the same object, you could define a public static reference and have your code use AtomicReferenceChange.myReference everywhere. If you want to change the referenced object atomically, use the static method changeObject(...).

Categories

Resources