State of Lambda and Imperfections in Anonymous Classes - java

I was reading again Brian Goetz document on the State of Lambda where he details many of the reasons why Java needed lambda expressions.
In one of the paragraphs he wrote:
Given the increasing relevance of callbacks and other functional-style
idioms, it is important that modeling code as data in Java be as
lightweight as possible. In this respect, anonymous inner classes are
imperfect for a number of reasons, primarily:
Bulky syntax
Confusion surrounding the meaning of names and this
Inflexible class-loading and instance-creation semantics
Inability to capture non-final local variables
Inability to abstract over control flow
From this list of imperfections I believe I understand reasonably well the items (1), (2) and (4).
But I have no clue of what exactly the problems are in (3) and (5).
Can anybody out there provide any examples of how these two could be an issue when using anonymous classes?
Not all the projects I work on are yet on Java 8 and so I think it is important to understand these shortcomings and above all see clearly how things are better now with Java 8 lambdas. Also, since Brian was one of the leaders of the project lambda I thought it was worth my time to give it some thought to what he meant by this, it could lead me to an epiphany :-)

Well 5. Inability to abstract over control flow is easy.
Lambda's are great to iterate over all the elements in a collection.
aCollection.forEach( myLambda)
The old way you would have to use for loops or Iterators or something similar.
for( ....){
//same code as what's in the lambda
}
This is called internal iteration. We have to tell the collection not only what do do with each element in the collection BUT ALSO HOW TO GET EACH ELEMENT. This code iterates through all the objects in order sequentially. Sometimes that isn't the best for performance reasons.
Lambdas allow us to do external iteration. We only tell the collection what to do with each element. How each element is accessed and in what order is up to the Collection implementation to do it the most efficent way it can using internal implementation knowledge. It may even be parallel not sequential.
3. Inflexible class-loading and instance-creation semantics
Is a lower level issue with how Anonymous classes are loaded and instantiated. I will point you to this article: http://www.infoq.com/articles/Java-8-Lambdas-A-Peek-Under-the-Hood
But basically
anonymous classes require making new class files for each one (MyClass$1 etc). This extra class has to be loaded. Lambdas don't make new class files and their byte code is created dynamically at runtime.
Future versions of Java may be able to make Lambdas differently under the hood. By generating the lambda bytecode at runtime, future versions can safely change how Lambdas get created without breaking anything

I also want to add another thing about (3). "Instance-creation" might refer to the fact that when you create an instance of an anonymous class (new ...), just like when you create an instance of any class, you are guaranteed to get a new object. So the reference guaranteed to compare unequal != to the reference to any other object.
On the other hand, for lambdas, there is no guarantee that running a lambda expression twice will evaluate to two different objects. In particular, if the lambda doesn't capture any variables, then all instances of the lambda are functionally identical. In this case, it could just allocate one object statically and use it for the duration of the program. Allocating lots of objects is not cheap, so in the cases where it can avoid creating more objects, it makes the program more efficient.

Related

How to decide between lambda iteration and normal loop?

Since he introduction of Java 8 I got really hooked to lambdas and started using them whenever possible, mostly to start getting accustomed to them. One of the most common usage is when we want to iterate and act upon a collection of objects in which case I either resort to forEach or stream(). I rarely write the old for(T t : Ts) loop and I almost forgot about the for(int i = 0.....).
However, we were discussing this with my supervisor the other day and he told me that lambdas aren't always the best choice and can sometimes hinder performance. From a lecture I had seen on this new feature I got the feeling that lambda iterations are always fully optimized by the compiler and will (always?) be better than bare iterations, but he begs to differ. Is this true? If yes how do I distinguish between the best solution in each scenario?
P.S: I'm not talking about cases where it is recommended to apply parallelStream. Obviously those will be faster.
Performance depends on so many factors, that it’s hard to predict. Normally, we would say, if your supervisor claims that there was a problem with performance, your supervisor is in charge of explaining what problem.
One thing someone might be afraid of, is that behind the scenes, a class is generated for each lambda creation site (with the current implementation), so if the code in question is executed only once, this might be considered a waste of resources. This harmonizes with the fact that lambda expressions have a higher initialization overhead as the ordinary imperative code (we are not comparing to inner classes here), so inside class initializers, which only run once, you might consider avoiding it. This is also in line with the fact, that you should never use parallel streams in class initializers, so this potential advantage isn’t available here anyway.
For ordinary, frequently executed code that is likely to be optimized by the JVM, these problems do not arise. As you supposed correctly, classes generated for lambda expressions get the same treatment (optimizations) as other classes. At these places, calling forEach on collections bears the potential of being more efficient than a for loop.
The temporary object instances created for an Iterator or the lambda expression are negligible, however, it might be worth noting that a foreach loop will always create an Iterator instance whereas lambda expression do not always do. While the default implementation of Iterable.forEach will create an Iterator as well, some of the most often used collections take the opportunity to provide a specialized implementation, most notably ArrayList.
The ArrayList’s forEach is basically a for loop over an array, without any Iterator. It will then invoke the accept method of the Consumer, which will be a generated class containing a trivial delegation to the synthetic method containing the code of you lambda expression. To optimize the entire loop, the horizon of the optimizer has to span the ArrayList’s loop over an array (a common idiom recognizable for an optimizer), the synthetic accept method containing a trivial delegation and the method containing your actual code.
In contrast, when iterating over the same list using a foreach loop, an Iterator implementation is created containing the ArrayList iteration logic, spread over two methods, hasNext() and next() and instance variables of the Iterator. The loop will repeatedly invoke the hasNext() method to check the end condition (index<size) and next() which will recheck the condition before returning the element, as there is no guaranty that the caller does properly invoke hasNext() before next(). Of course, an optimizer is capable of removing this duplication, but that requires more effort than not having it in the first place. So to get the same performance of the forEach method, the optimizer’s horizon has to span your loop code, the nontrivial hasNext() implementation and the nontrivial next() implementation.
Similar things may apply to other collections having a specialized forEach implementation as well. This also applies to Stream operations, if the source provides a specialized Spliterator implementation, which does not spread the iteration logic over two methods like an Iterator.
So if you want to discuss the technical aspects of foreach vs. forEach(…), you may use these information.
But as said, these aspects describe only potential performance aspects as the work of the optimizer and other runtime environmental aspects may change the outcome completely. I think, as a rule of thumb, the smaller the loop body/action is, the more appropriate is the forEach method. This harmonizes perfectly with the guideline of avoiding overly long lambda expressions anyway.
It depends on specific implementation.
In general forEach method and foreach loop over Iterator usually have pretty similar performance as they use similar level of abstraction. stream() is usually slower (often by 50-70%) as it adds another level that provides access to the underlying collection.
The advantages of stream() generally are the possible parallelism and easy chaining of the operations with lot of reusable ones provided by JDK.

How do Java 8 lambdas differ from anonymous classes as objects (under the hood)? [duplicate]

When I iterate over a collection using the new syntactic sugar of Java 8, such as
myStream.forEach(item -> {
// do something useful
});
Isn't this equivalent to the 'old syntax' snippet below?
myStream.forEach(new Consumer<Item>() {
#Override
public void accept(Item item) {
// do something useful
}
});
Does this mean a new anonymous Consumer object is created on the heap every time I iterate over a collection? How much heap space does this take? What performance implications does it have? Does it mean I should rather use the old style for loops when iterating over large multi-level data structures?
It is equivalent but not identical. Simply said, if a lambda expression does not capture values, it will be a singleton that is re-used on every invocation.
The behavior is not exactly specified. The JVM is given big freedom on how to implement it. Currently, Oracle’s JVM creates (at least) one instance per lambda expression (i.e. doesn’t share instance between different identical expressions) but creates singletons for all expressions which don’t capture values.
You may read this answer for more details. There, I not only gave a more detailed description but also testing code to observe the current behavior.
This is covered by The Java® Language Specification, chapter “15.27.4. Run-time Evaluation of Lambda Expressions”
Summarized:
These rules are meant to offer flexibility to implementations of the Java programming language, in that:
A new object need not be allocated on every evaluation.
Objects produced by different lambda expressions need not belong to different classes (if the bodies are identical, for example).
Every object produced by evaluation need not belong to the same class (captured local variables might be inlined, for example).
If an "existing instance" is available, it need not have been created at a previous lambda evaluation (it might have been allocated during the enclosing class's initialization, for example).
When an instance representing the lambda is created sensitively depends on the exact contents of your lambda's body. Namely, the key factor is what the lambda captures from the lexical environment. If it doesn't capture any state which is variable from creation to creation, then an instance will not be created each time the for-each loop is entered. Instead a synthetic method will be generated at compile time and the lambda use site will just receive a singleton object that delegates to that method.
Further note that this aspect is implementation-dependent and you can expect future refinements and advancements on HotSpot towards greater efficiency. There are general plans to e.g. make a lightweight object without a full corresponding class, which has just enough information to forward to a single method.
Here is a good, accessible in-depth article on the topic:
http://www.infoq.com/articles/Java-8-Lambdas-A-Peek-Under-the-Hood
You are passing a new instance to the forEach method. Every time you do that you create a new object but not one for every loop iteration. Iteration is done inside forEach method using the same 'callback' object instance until it is done with the loop.
So the memory used by the loop does not depend on the size of the collection.
Isn't this equivalent to the 'old syntax' snippet?
Yes. It has slight differences at a very low level but I don't think you should care about them. Lamba expressions use the invokedynamic feature instead of anonymous classes.

Does a lambda expression create an object on the heap every time it's executed?

When I iterate over a collection using the new syntactic sugar of Java 8, such as
myStream.forEach(item -> {
// do something useful
});
Isn't this equivalent to the 'old syntax' snippet below?
myStream.forEach(new Consumer<Item>() {
#Override
public void accept(Item item) {
// do something useful
}
});
Does this mean a new anonymous Consumer object is created on the heap every time I iterate over a collection? How much heap space does this take? What performance implications does it have? Does it mean I should rather use the old style for loops when iterating over large multi-level data structures?
It is equivalent but not identical. Simply said, if a lambda expression does not capture values, it will be a singleton that is re-used on every invocation.
The behavior is not exactly specified. The JVM is given big freedom on how to implement it. Currently, Oracle’s JVM creates (at least) one instance per lambda expression (i.e. doesn’t share instance between different identical expressions) but creates singletons for all expressions which don’t capture values.
You may read this answer for more details. There, I not only gave a more detailed description but also testing code to observe the current behavior.
This is covered by The Java® Language Specification, chapter “15.27.4. Run-time Evaluation of Lambda Expressions”
Summarized:
These rules are meant to offer flexibility to implementations of the Java programming language, in that:
A new object need not be allocated on every evaluation.
Objects produced by different lambda expressions need not belong to different classes (if the bodies are identical, for example).
Every object produced by evaluation need not belong to the same class (captured local variables might be inlined, for example).
If an "existing instance" is available, it need not have been created at a previous lambda evaluation (it might have been allocated during the enclosing class's initialization, for example).
When an instance representing the lambda is created sensitively depends on the exact contents of your lambda's body. Namely, the key factor is what the lambda captures from the lexical environment. If it doesn't capture any state which is variable from creation to creation, then an instance will not be created each time the for-each loop is entered. Instead a synthetic method will be generated at compile time and the lambda use site will just receive a singleton object that delegates to that method.
Further note that this aspect is implementation-dependent and you can expect future refinements and advancements on HotSpot towards greater efficiency. There are general plans to e.g. make a lightweight object without a full corresponding class, which has just enough information to forward to a single method.
Here is a good, accessible in-depth article on the topic:
http://www.infoq.com/articles/Java-8-Lambdas-A-Peek-Under-the-Hood
You are passing a new instance to the forEach method. Every time you do that you create a new object but not one for every loop iteration. Iteration is done inside forEach method using the same 'callback' object instance until it is done with the loop.
So the memory used by the loop does not depend on the size of the collection.
Isn't this equivalent to the 'old syntax' snippet?
Yes. It has slight differences at a very low level but I don't think you should care about them. Lamba expressions use the invokedynamic feature instead of anonymous classes.

Why Wrapper class like Boolean in java is immutable?

I can't see the reason why the Boolean wrapper classes were made Immutable.
Why the Boolean Wrapper was not implemented like MutableBoolean in Commons lang which actually can be reset.
Does anyone have any idea/understanding about this ? Thanks.
Because 2 is 2. It won't be 3 tomorrow.
Immutable is always preferred as the default, especially in multithreaded situations, and it makes for easier to read and more maintainable code. Case in point: the Java Date API, which is riddled with design flaws. If Date were immutable the API would be very streamlined. I would know Date operations would create new dates and would never have to look for APIs that modify them.
Read Concurrency in Practice to understand the true importance of immutable types.
But also note that if for some reason you want mutable types, use AtomicInteger AtomicBoolean, etc. Why Atomic? Because by introducing mutability you introduced a need for threadsafety. Which you wouldn't have needed if your types stayed immutable, so in using mutable types you also must pay the price of thinking about threadsafety and using types from the concurrent package. Welcome to the wonderful world of concurrent programming.
Also, for Boolean - I challenge you to name a single operation that you might want to perform that cares whether Boolean is mutable. set to true? Use myBool = true. That is a re-assignment, not a mutation. Negate? myBool = !myBool. Same rule. Note that immutability is a feature, not a constraint, so if you can offer it, you should - and in these cases, of course you can.
Note this applies to other types as well. The most subtle thing with integers is count++, but that is just count = count + 1, unless you care about getting the value atomically... in which case use the mutable AtomicInteger.
Wrapper classes in Java are immutable so the runtime can have only two Boolean objects - one for true, one for false - and every variable is a reference to one of those two. And since they can never be changed, you know they'll never be pulled out from under you. Not only does this save memory, it makes your code easier to reason about - since the wrapper classes you're passing around you know will never have their value change, they won't suddenly jump to a new value because they're accidentally a reference to the same value elsewhere.
Similarly, Integer has a cache of all signed byte values - -128 to 127 - so the runtime doesn't have to have extra instances of those common Integer values.
Patashu is the closest. Many of the goofy design choices in Java were because of the limitations of how they implemented a VM. I think originally they tried to make a VM for C or C++ but it was too hard (impossible?) so made this other, similar language. Write one, run everywhere!
Any computer sciency justification like those other dudes spout is just after-the-fact folderal. As you now know, Java and C# are evolving to be as powerful as C. Sure, they were cleaner. Ought to be for languages designed decade(s) later!
Simple trick is to make a "holder" class. Or use a closure nowadays! Maybe Java is evolving into JavaScript. LOL.
Boolean or any other wrapper class is immutable in java. Since wrapper classes are used as variables for storing simple data, those should be safe and data integrity must be maintained to avoid inconsistent or unwanted results. Also, immutability saves lots of memory by avoiding duplicate objects. More can be found in article Why Strings & Wrapper classes are designed immutable in java?

what are the OOP features which are not in java but c++ has those features?

Respected Sir!
As i have not learnt java yet but most people say that C++ has more OOP features than Java, I would like to know that what are the features that c++ has and java doesn't. Please explain.
From java.sun.com
Java omits many rarely used, poorly understood, confusing features of C++ that in our experience bring more grief than benefit. These omitted features primarily consist of operator overloading (although the Java language does have method overloading), multiple inheritance, and extensive automatic coercions.
For a more detailed comparison check out this Wikipedia page.
This might be controversial, but some authors say that using free functions might be more object oriented than writting methods for everything. So by those author's point of view, free functions in C++ make it more OO than Java (not having them).
The explanation is that there are some operations that are not really performed on an instance of an object, but rather externally, and that having externally defined operations for those cases improves the OO design. Some of the cases are operations on two objects that are not naturally an operation of either one. Incrementing a value is clearly an operation on the value, but creating a new value with the sum of two others (or concatenating) are not really operations on the instance. When you write:
String a = "Hello";
String b = " World";
String c = a.append( b );
The append operation is not performed on a: after the operation a is still "Hello". The operation is not performed on b either, it is an external operation that is performed on both a and b. In this particular example, the most OO way of implementing the operation would be providing a new constructor that takes two arguments (after all, the operation is performed on the new string), but another solution would be providing an external function append that takes two strings and returns a third one.
In this case, where both instances are of the same type, the operation can naturally be performed as a static method of the type, but when you mix different types the operation is not really part of either one, and in some cases it might end up being of a completely different type. In some cases free functions are faked in Java as in the Collections java class, it does not represent any OO element, but is rather simple glue to tie free functions are static methods because the language does not have support for the former. Note that all those algorithms are not performed on the collection nor an instance of the contained type.
Multiple inheritance
Template Metaprogramming
C++ is a huge language and it is common for C++ developers to only use a small subset during development. These language features are often cited as being the most dangerous/difficult part of C++ to master and are often avoided.
In C++ you can bypass the OO model and make up your own stuff, whereas in Java, the VM decides that you cannot. Very simplified, but you know... who has the time.
I suppose some would consider operator overloading an object oriented feature(if you view binary operators not much different then class methods).
Some links, that give some good answers:
Java is not pure a OOP language (... but I don't care ;) )
Comparing C++ and Java (Java Coffee Break article)
Comparing Java and C++ (Wikipedia comprehensive comparision)
Be careful. There are multiple definitions of OOP out there. For example, the definitions in Wegner 87 and Booch et al 91 are different to what people say in Java is not pure a OOP language.
All this "my language is more OO than your language" stuff is a bit pointless, IMO.

Categories

Resources