This wikipedia states:
Since the specific type of a polymorphic object is not known before
runtime (in general), the executed function is dynamically bound.
Take, for example, the following Java code:
public void foo(java.util.List<String> list) {
list.add("bar");
}
List is an interface, so list must refer to a subtype of it. Is it a
reference to a LinkedList, an ArrayList, or some other subtype of
List? The actual method referenced by add is not known until runtime.
Consider this example:
List<String> list;
list = new LinkedList<String>();
foo(list);
list = new ArrayList<String>();
foo(list);
Why is the actual method referenced here is not know until runtime? Couldn't the compiler just check for each call of foo of which type the object list is assigned to? Of course this would be only possible if the program is deterministic and no randomness is involved (e.g. user interaction).
Is this what (in general) in the quoted statement is about or is my understanding wrong?
In the special case when the program is deterministic, is static binding used or is - in Java - always dynamic binding used, regardless of what is possible? If so, why?
The statement speaks about the general case. Given only the code of the Wikipedia example it is not possible to tell the concrete type of the list parameter. In your example it is possible to tell the concrete types.
The Java runtime is allowed and does in fact devirtualize method calls if it can detect the concrete type of a variable.
If you are interested in the topic: Here is a link to paper which discusses devirtualization techniques.
The devirtualization is not preformed during the compilation of java source to java bytecode. Otherwise this would be quite fragile. Note that compiled java classes usually preserve binary compatibility (with some known exceptions). Thus if your foo is located in the separate class and you recompile just this class, then the class calling foo should work with new code without recompilation.
However the devirtualization is possible at runtime and actually performed by most of modern JVMs (including Oracle HotSpot JVM, or course). This method is likely to be fully inlined during the JIT-compilation: both foo calls, LinkedList.add and ArrayList.add methods will be merged into the body of caller method.
So in general Wikipedia quote is correct: The actual method referenced by add is not known until runtime. However this does not mean that the call remains polymorphic as JVM runtime is quite complex thing which includes interpreter, JIT-compilation and execution of JIT-compiled code.
Related
Java introduced type erasure with generics in Java 5 so they would work on old versions of Java. It was a tradeoff for compatibility. We've since lost that compatibility[1] [2] [3]--bytecode can be run on later versions of the JVM but not earlier ones. This looks like the worse possible choice: we've lost type information and we still can't run bytecode compiled for newer versions of the JVM on older versions. What happened?
Specifically I'm asking if there are any technical reasons why type erasure couldn't be removed in the next version of the JVM (assuming, like previous releases, its bytecode won't be able to run on the last version anyway).
[3]: Type erasure could be backported in a manner similar to retrolambda for those who really like it.
Edit: I think the discussion of the definition of backwards vs. forwards compatibility is obscuring the question.
Type erasure is more than just a byte code feature that you can turn on or off.
It affects the way the entire runtime environment works. If you want to be able to query the generic type of every instance of a generic class, it implies that meta information, comparable to a runtime Class representation, is created for each object instantiation of a generic class.
If you write new ArrayList<String>(); new ArrayList<Number>(); new ArrayList<Object>() you are not only creating three objects, you are potentially creating three additional meta objects reflecting the types, ArrayList<String>, ArrayList<Number>, and ArrayList<Object>, if they didn’t exist before.
Consider that there are thousand of different List signatures in use in a typical application, most of them never used in a place where the availability of such Reflection is required (due to the absence of this feature, we could conclude that currently, all of them work without such a Reflection).
This, of course, multiplies, thousand different generic list types imply thousand different generic iterator types, thousand spliterator and Stream incarnations, not even counting the internal classes of the implementation.
And it even affects places without an object allocation which are currently exploting the type erasure under the hood, e.g. Collections.emptyList(), Function.identity() or Comparator.naturalOrder(), etc. return the same instance each time they are invoked. If you insist on having the particalar captured generic type reflectively inspectable, this won’t work anymore. So if you write
List<String> list=Collections.emptyList();
List<Number> list=Collections.emptyList();
you would have to receive two distinct instances, each of them reporting a different on getClass() or the future equivalent.
It seems, people wishing for this ability have a narrow view on their particular method, where it would be great if they could reflectively find out whether one particular parameter is actually one out of two or three types, but never think about the weight of carrying meta information about potentially hundreds or thousands generic instantiations of thousands of generic classes.
This is the place where we have to ask what we gain in return: the ability to support a questionable coding style (this is what altering the code’s behavior due to information found via Reflection is all about).
The answer so far only addressed the easy aspect of removing type erasure, the desire the introspect the type of an actual instance. An actual instance has a concrete type, which could be reported. As mentioned in this comment from the user the8472, the demand for removal of type erasure often also implies the wish for being able to cast to (T) or create an array via new T[] or access the type of a type variable via T.class.
This would raise the true nightmare. A type variable is a different beast than the actual type of a concrete instance. A type variable could resolve to a, e.g. ? extends Comparator<? super Number> to name one (rather simple) example. Providing the necessary meta information would imply that not only object allocation becomes much more expensive, every single method invocation could impose these additional cost, to an even bigger extend as we are now not only talking about the combination of generic classes with actual classes, but also every possible wildcarded combination, even of nested generic types.
Keep in mind that the actual type of a type parameter could also refer to other type parameters, turning the type checking into a very complex process, which you not only have to repeat for every type cast, if you allow to create an array out of it, every storage operation has to repeat it.
Besides the heavy performance issue, the complexity raises another problem. If you look at the bug tracking list of javac or related questions of Stackoverflow, you may notice that the process is not only complex, but also error prone. Currently, every minor version of javac contains changes and fixes regarding generic type signature matching, affecting what will be accepted or rejected. I’m quite sure, you don’t want intrinsic JVM operations like type casts, variable assignments or array stores to become victim of this complexity, having a different idea of what is legal or not in every version or suddenly rejecting what javac accepted at compile-time due to mismatching rules.
To some extent erasure will be removed in the future with project valhalla to enable specialized implementations for value types.
Or to put it more accurately, type erasure really means the absence of type specialization for generics, and valhalla will introduce specialization over primitives.
Specifically I'm asking if there are any technical reasons why type erasure couldn't be removed in the next version of the JVM
Performance. You don't have to generate specialized code for all combinations of generic types, instances or generated classes don't have to carry type tags, polymorphic inline caches and runtime type checks (compiler-generated instanceof checks) stay simple and we still get most of the type-safety through compile-time checks.
Of course there are also plenty of downsides, but the tradeoff has already been made, and the question what would motivate the JVM devs to change that tradeoff.
And it might also be a compatibility thing, there could be code that performs unchecked casts to abuse generic collections by relying on type erasure that would break if the type constraints were enforced.
Your understanding of backwards compatibility is wrong.
The desired goal is for new JVM's to be able to run old library code correctly and unchanged even with new code. This allows users to upgrade their Java versions reliably even to much newer versions than the code was written for.
Let me preface this question by saying up front that I understand what Java can and can't do and am not asking about that. I'm wondering what the actual technical challenges are, from JVM and compiler standpoint, that require the compiler to behave the way it does.
Whenever I see discussions on weaknesses or most hated aspects of java Type Erasure always seems to be somewhere near the top of the list for Java Developers (it is for me!). If my history is correct Java 1.0 never implementing any type checking beyond passing Objects and recasting them. When a better Type system was required Sun had to decide between full Typing support which would break backwards comparability or going with their chosen solution of generics which didn't break old code.
Meanwhile C# ran into the same issue and went the opposite route of breaking backwards comparability to implement a more complex typing system around the same time (I believe).
My main question is why was this a either-or question for the two languages? What is it about the compiler process that means there is no way to support C# style handling of type without breaking backwards comparability in old code? I understand part of the problem is that the exact type is not always known at compile time, but at first (naive) glance it seems like some times it can be known at compile time, or that it can be left unknown at compile time and handled with a sort of reflection approach at runtime.
Is the problem that it's not feasible to implement, or that it was simply deemed too slow to implement a runtime sort of solution?
To go a step further lets use a simple generic factory example of code as an example of a place where type erasure feels rather cumbersome.
public class GenericFactory<FinalType, BuilderType<FinalType> extends GenericBuilder<FinalType>>{
private Class builderClass;
public GenericFactory(Class<BuilderType> builderClass){
this.builderClass=builderClass;
}
public FinalType create(){
GenericBuilder builder=builderClass.newInstance();
builder.setFoo(getSystemProperty("foo");
builder.setBar(getSystemProperty("bar");
builder.setBaz(getSystemProperty("baz");
return builder.build();
}
}
This example, assuming I didn't screw up on syntax somewhere, shows two particular annoyances of type erasure that at first glance seem like they should be easier to handle.
First, and less relevant, I had to add a FinalType parameter before I could refer to BuilderType extends GenericBuilder, even though it seems like FinalType could be inferred from BuilderType. I say less relevant since this may be more about generics syntax/implementation then the compiler limits that forced type erasure.
The second issue is that I had to pass in my BuilderClass object to the constructor in order to use reflection to build the builder, despite it being defined by the generics already. It seems as if it would be relatively easy for the compiler to store the generic class used here (so long as it didn't use the ? syntax) to allow reflection to look up the generic and then construct it.
Since this isn't done I presume there is a very good reason it is not. I'm trying to understand what these reasons are, what forces the JVM to stick with type erasure to maintain backwards compatibility?
I'm not sure what you're describing (the two "annoyances") are a result of type erasure.
I had to add a FinalType parameter before I could refer to BuilderType extends GenericBuilder, even though it seems like FinalType could be inferred from BuilderType
BuilderType<FinalType> would not be a valid generic type name unless I missed some changes to that in Java 8. Thus it should be BuilderType extends GenericBuilder<FinalType> which is fine. FinalType can't be inferred here, how should the compiler know which type to provide?
The second issue is that I had to pass in my BuilderClass object to the constructor in order to use reflection to build the builder, despite it being defined by the generics already.
That's not true. The generic parameters don't define what FinalType actually is. I could create a GenericFactory<String, StringBuilderType> (with StringBuilderType extends GenericBuilder<String>) as well as a GenericFactory<Integer, IntegerBuilderType> (with IntegerBuilderType extends GenericBuilder<Integer>).
Here, if you'd provide the type parameters to a variable definition or method call, type erasure would happen. As for the why refer to Andy's comment.
However, if you'd have a field or subclass, e.g. private GenericFactory<String, StringBuilderType> stringFactory, there is no type erasure. The generic types can be extracted from the reflection data (unfortunately there's no easy built-in way, but have a look here: http://www.artima.com/weblogs/viewpost.jsp?thread=208860).
Studying Java, I've come across generic methods.
public <T> void foo(T variable) { }
That is, a method which takes a parameter with an undecided type (á la PHP?). I'm however unable to see how this would be a good solution - especially since I've come to fall in love with a strongly typed languages after coming from a loose ones.
Is there any reason to use generic methods? If so, when?
Those who are coming from prior to Java 5 background knows that how inconvenient it was to store object in Collection and then cast it back to correct Type before using it. Generics prevents from those. it provides compile time type-safety and ensures that you only insert correct Type in collection and avoids ClassCastException in runtime.
So it provides compile time type-safety and casting. When you want to write complex APIs with complex method signatures it will save you a lot both when writing the API and when using the API and prevents writing lots of code for casting part and catch your errors at compile time. just take a look at java.util.Collection package and see the source code.
As a developer I always want compiler to catch my error at compile time and inform me when I want to compile it then i will fix my errors and at runtime there won't be many errors related to type-safety.
for more info see :
http://javarevisited.blogspot.com/2011/09/generics-java-example-tutorial.html
http://javarevisited.blogspot.com/2012/06/10-interview-questions-on-java-generics.html
Generics, among other things, give you a way to provide a template -- i.e. you want to do the same thing, and the only difference is the type.
For example, look at the List API, you will see the methods
add(E e)
For every list of the same type you declare, the only thing different about the add method is the type of the thing going into the list. This is a prime example of where generics are useful. (Before generics were introduced to Java, you would declare a list, and you could add anything to the list, but you would have to cast the object when you retrieved it)
More specifically, you might want 2 ArrayList instances, one that takes type1 and one that takes type2. The list code for add is going to do the same thing, execute the same code, for each list (since the two lists are both ArrayList instances), right? So the only thing different is what's in the lists.
(As #michael points out, add isn't a true example of a generic method, but there are true generic methods in the API linked, and the concept is the same)
There's nothing non-strongly typed about generic functions in general. The type is resolved and checked at compile time. It's not an undecided type, it's one of a range of possible types (these can be constrained, in your example they are not). At compile time it is known and decided.
As hvgotcodes says, the Collections API contains a number of good examples of this in use.
The main objective of Generic concepts are :
To provide type safety to the Collections so that they can hold only
one particular type of object.
To resolve typecasting problems.
To hold only String type of object a Generic version of ArrayList can be declare as follows :
ArrayList l = new ArrayList ();
To know more : http://algovalley.com/java/generics.php
I read from an interview with Neal Gafter:
"For example, adding function types to the programming language is much more difficult with Erasure as part of Generics."
EDIT:
Another place where I've met similar statement was in Brian Goetz's message in Lambda Dev mailing list, where he says that lambdas are easier to handle when they are just anonymous classes with syntactic sugar:
But my objection to function types was not that I don't like function types -- I love function types -- but that function types fought badly with an existing aspect of the Java type system, erasure. Erased function types are the worst of both worlds. So we removed this from the design.
Can anyone explain these statements? Why would I need runtime type information with lambdas?
The way I understand it, is that they decided that thanks to erasure it would be messy to go the way of 'function types', e.g. delegates in C# and they only could use lambda expressions, which is just a simplification of single abstract method class syntax.
Delegates in C#:
public delegate void DoSomethingDelegate(Object param1, Object param2);
...
//now assign some method to the function type variable (delegate)
DoSomethingDelegate f = DoSomething;
f(new Object(), new Object());
(another sample here
http://geekswithblogs.net/joycsharp/archive/2008/02/15/simple-c-delegate-sample.aspx)
One argument they put forward in Project Lambda docs:
Generic types are erased, which would expose additional places where
developers are exposed to erasure. For example, it would not be
possible to overload methods m(T->U) and m(X->Y), which would be
confusing.
section 2 in:
http://cr.openjdk.java.net/~briangoetz/lambda/lambda-state-3.html
(The final lambda expressions syntax will be a bit different from the above document:
http://mail.openjdk.java.net/pipermail/lambda-dev/2011-September/003936.html)
(x, y) => { System.out.printf("%d + %d = %d%n", x, y, x+y); }
All in all, my best understanding is that only a part of syntax stuff that could, actually will be used.
What Neal Gafter most likely meant was that not being able to use delegates will make standard APIs more difficult to adjust to functional style, rather than that javac/JVM update would be more difficult to be done.
If someone understands this better than me, I will be happy to read his account.
Goetz expands on the reasoning in State of the Lambda 4th ed.:
An alternative (or complementary) approach to function types,
suggested by some early proposals, would have been to introduce a new,
structural function type. A type like "function from a String and an
Object to an int" might be expressed as (String,Object)->int. This
idea was considered and rejected, at least for now, due to several
disadvantages:
It would add complexity to the type system and further mix structural and nominal types.
It would lead to a divergence of library styles—some libraries would continue to use callback interfaces, while others would use structural
function types.
The syntax could be unweildy, especially when checked exceptions were included.
It is unlikely that there would be a runtime representation for each distinct function type, meaning developers would be further exposed to
and limited by erasure. For example, it would not be possible (perhaps
surprisingly) to overload methods m(T->U) and m(X->Y).
So, we have instead chosen to take the path of "use what you
know"—since existing libraries use functional interfaces extensively,
we codify and leverage this pattern.
To illustrate, here are some of the functional interfaces in Java SE 7
that are well-suited for being used with the new language features;
the examples that follow illustrate the use of a few of them.
java.lang.Runnable
java.util.concurrent.Callable
java.util.Comparator
java.beans.PropertyChangeListener
java.awt.event.ActionListener
javax.swing.event.ChangeListener
...
Note that erasure is just one of the considerations. In general, the Java lambda approach goes in a different direction from Scala, not just on the typed question. It's very Java-centric.
Maybe because what you'd really want would be a type Function<R, P...>, which is parameterised with a return type and some sequence of parameter types. But because of erasure, you can't have a construct like P..., because it could only turn into Object[], which is too loose to be much use at runtime.
This is pure speculation. I am not a type theorist; i haven't even played one on TV.
I think what he means in that statement is that at runtime Java cannot tell the difference between these two function definitions:
void doIt(List<String> strings) {...}
void doIt(List<Integer> ints) {...}
Because at compile time, the information about what type of data the List contains is erased, so the runtime environment wouldn't be able to determine which function you wanted to call.
Trying to compile both of these methods in the same class will throw the following exception:
doIt(List<String>) clashes with doIt(List<Integer); both methods have the same erasure
Some classes in the standard Java API are treated slightly different from other classes. I'm talking about those classes that couldn't be implemented without special support from the compiler and/or JVM.
The ones I come up with right away are:
Object (obviously) as it, among other things doesn't have a super class.
String as the language has special support for the + operator.
Thread since it has this magical start() method despite the fact that there is no bytecode instruction that "forks" the execution.
I suppose all classes like these are in one way or another mentioned in the JLS. Correct me if I'm wrong.
Anyway, what other such classes exist? Is there any complete list of "glorified classes" in the Java language?
There are a lot of different answers, so I thought it would be useful to collect them all (and add some):
Classes
AutoBoxing classes - the compiler only allows for specific classes
Class - has its own literals (int.class for instance). I would also add its generic typing without creating new instances.
String - with it's overloaded +-operator and the support of literals
Enum - the only class that can be used in a switch statement (soon a privilege to be given to String as well). It does other things as well (automatic static method creation, serialization handling, etc.), but those could theoretically be accomplished with code - it is just a lot of boilerplate, and some of the constraints could not be enforced in subclasses (e.g. the special subclassing rules) but what you could never accomplish without the priviledged status of an enum is include it in a switch statement.
Object - the root of all objects (and I would add its clone and finalize methods are not something you could implement)
References: WeakReference, SoftReference, PhantomReference
Thread - the language doesn't give you a specific instruction to start a thread, rather it magically applies it to the start() method.
Throwable - the root of all classes that can work with throw, throws and catch, as well as the compiler understanding of Exception vs. RuntimeException and Error.
NullPointerException and other exceptions such as ArrayIndexOutOfBounds which can be thrown by other bytecode instructions than athrow.
Interfaces
Iterable - the only interface that can be used in an enhanced for loop
Honorable mentions goes to:
java.lang.reflect.Array - creating a new array as defined by a Class object would not be possible.
Annotations They are a special language feature that behaves like an interface at runtime. You certainly couldn't define another Annotation interface, just like you can't define a replacement for Object. However, you could implement all of their functionality and just have another way to retrieve them (and a whole bunch of boilerplate) rather than reflection. In fact, there were many XML based and javadoc tag based implementations before annotations were introduced.
ClassLoader - it certainly has a privileged relationship with the JVM as there is no language way to load a class, although there is a bytecode way, so it is like Array in that way. It also has the special privilege of being called back by the JVM, although that is an implementation detail.
Serializable - you could implement the functionality via reflection, but it has its own privileged keyword and you would spend a lot of time getting intimate with the SecurityManager in some scenarios.
Note: I left out of the list things that provide JNI (such as IO) because you could always implement your own JNI call if you were so inclined. However, native calls that interact with the JVM in privileged ways are different.
Arrays are debatable - they inherit Object, have an understood hierarchy (Object[] is a supertype of String[]), but they are a language feature, not a defined class on its own.
Class, of course. It has its own literals (a distinction it shares with String, BTW) and is the starting point of all that reflection magic.
sun.misc.unsafe is the mother of all dirty, spirit-of-the-language-breaking hacks.
Enum. You're not allowed to subclass it, but the compiler can.
Many things under java.util.concurrent can be implemented without JVM support, but they would be a lot less efficient.
All of the Number classes have a little bit of magic in the form of Autoboxing.
Since the important classes were mentioned, I'll mention some interfaces:
The Iterable interface (since 1.5) - it allows an object to participate in a foreach loop:
Iterable<Foo> iterable = ...;
for (Foo foo : iterable) {
}
The Serializable interface has a very special meaning, different from a standard interface. You can define methods that will be taken into account even though they are not defined in the interface (like readResolve()). The transient keyword is the language element that affects the behaviour of Serializable implementors.
Throwable, RuntimeException, Error
AssertionError
References WeakReference, SoftReference, PhantomReference
Enum
Annotation
Java array as in int[].class
java.lang.ClassLoader, though the actual dirty work is done by some unmentioned subclass (see 12.2.1 The Loading Process).
Not sure about this. But I cannot think of a way to manually implement IO objects.
There is some magic in the System class.
System.arraycopy is a hook into native code
public static native void arraycopy(Object array1, int start1,
Object array2, int start2, int length);
but...
/**
* Private version of the arraycopy method used by the jit
* for reference arraycopies
*/
private static void arraycopy(Object[] A1, int offset1,
Object[] A2, int offset2, int length) {
...
}
Well since the special handling of assert has been mentioned. Here are some more Exception types which have special treatment by the jvm:
NullPointerException
ArithmeticException.
StackOverflowException
All kinds of OutOfMemoryErrors
...
The exceptions are not special, but the jvm uses them in special cases, so you can't implement them yourself without writing your own jvm. I'm sure that there are more special exceptions around.
Most of those classes isn't really implemented with 'special' help from the compiler or JVM. Object does register some natives which poke around the internal JVM structures, but you can do that for your own classes as well. (I admit this is subject to semantics, "calls a native defined in the JVM" can be considered as special JVM support.)
What /is/ special is the behaviour of the 'new', and 'throw' instructions in how they initialise these internal structures.
Annotations and numbers are pretty much all-out freaky though.