Generic and the erasure process

Generic and the erasure process - java

I've a doubt reading this written in the Java tutorial:
In the introduction, we saw invocations of the generic type
declaration List, such as List. In the invocation (usually
called a parameterized type), all occurrences of the formal type
parameter (E in this case) are replaced by the actual type argument
(in this case, Integer).
but if there are no restrictions the formal type parameter is not replaced by Object?
Why is said that E is replaced by Integer?
Also, here, in the Java tutorial is said:
To reference the generic Box class from within your code, you must
perform a generic type invocation, which replaces T with some concrete
value, such as Integer:
but, again, thanks to the erasure a compile time T in box class is replaced
by Object and not by Integer. Integer type is written only for casting operations.
In fact, still in the same tutorial is said:
During the type erasure process, the Java compiler erases all type
parameters and replaces each with its first bound if the type
parameter is bounded, or Object if the type parameter is unbounded.
I'm really confused. Which is the truth?
Is T replaced by Integer or by Object?

You speak of different things.
The citations from the tutorial speak about type instantiation. This has nothing to do with type erasure, which is a IMHO misnamed concept, and simply means that the generic types are not available at runtime anymore.
But at compile time they are, and instantiation happens at compile time.
To answer your question, "at compile time" is a broad thing. THe following hapens all at compile time:
read source files
lexical analysis
parsing
...
type checking
...
code generation
The list is, by no means, complete, mind you.
However, as you see, during type checking, the compiler knows your type instantiations and can check them.
Later, it emits byte code, and since byte code has no way of representing generics, the types are "erased", which means, a cast is inserted here and there.
So, your assumption that "compile time" is somehow an instant where everything happens at once is not correct.
Further edit:
I think you take all this (i.e. the word "replace") too literally. For sure, the compiler has some data structures where the types and names and scopes of all items in the program are held.
Look, it's quite simple in principle, if we have:
static <X> List<X> meth(X[] arr) { .... }
And later, you do:
Integer arr = new Integer[100];
List<Integer> list = meth(arr);
Integer foo = list.get(1);
then you are instantiating the type of the meth method:
static List<Integer> meth(Integer[] arr) { .... }
The point of the generics is to say that meth works for any type. This is just what the compiler checks. And it will know, that, for all X if you pass an array of X, you get back a list of X, hence, since you passed Integer[], the result must be List<Integer> and the list assignment is correct. Furthermore, the compiler knows, that ** for all X **, if you get an element from a List<X>, it will be an X.
Therefore, the compiler notes and checks that foo is an Integer. Later, on code generation, it will insert there a cast to Integer, because, due to type erasure, the return value from List.get is Object.
Note also, that "replace" does not mean that the compiler somehow alters your code. It just creates (maybe temporary) from the generic type signature a non-generic one (by substituting - if you like this better - all the type parameters with their actual types), and uses this to check the type.
It is just like in math, if I say: Please replace the a with 42 and check if the equation is true:
a + 1 = 43
then it makes no sense to ask "where exactly" this replacement takes place. Most probably in your brain.

the formal type parameter is not replaced by Object?
Generic type represented as Object in runtime. But you can get information about <YourType> with reflection. Erasure relates to compatibility with old clases. It was a bad idea. Article about it.

Related

Why is Collection not simply treated as Collection<?>

Consider the following API method taken from Shiro's org.apache.shiro.subject.PrincipalCollection interface but probably present in other libraries as well:
Collection fromRealm(String realmName);
Yes even nowadays there are still libraries that are using raw-types, probably to preserve pre Java 1.5 compatibility?!
If I now want to use this method together with streams or optionals like this:
principals.fromRealm(realmName).stream().collect(Collectors.toSet());
I get a warning about unchecked conversion and using raw types and that I should prefer using parameterized types.
Eclipse:
Type safety: The method collect(Collector) belongs to the raw type Stream. References to generic type Stream<T> should be parameterized
javac:
Note: GenericsTest.java uses unchecked or unsafe operations.
As I can't change the API method's signature to get rid of this warning I can either annotate with #SuppressWarnings("unchecked") or simply cast to Collection<?> like this:
((Collection<?>) principals.fromRealm(realmName)).stream().collect(Collectors.toSet());
As this cast of course always works I'm wondering why the compilers are not simply treating Collection as Collection<?> but warn about this situation. Adding the annotation or the cast doesn't improve the code a single bit, but decreases readability or might even shadow actual valid warnings about usage of unparameterized types.

The reason is quite simple:
You may read Objects from a Collection<?> the same way as from Collection. But you can't add Objects to a Collection<?> (The compiler forbids this) whereas to a Collection you can.
If after the release of Java 5 the compiler had translated every Collection to Collection<?>, then previously written code would not compile anymore and thus would destroy the backward compatibility.

The major difference between raw type and unbounded wildcard <?> is that the latter is type safe, that is, on a compile level, it checks whether the items in the collection are of the same type. Compiler won't allow you to add string and integer to the collection of wildcard type, but it will allow you to do this:
List raw = new ArrayList();
raw.add("");
raw.add(1);
Actually, in case of unbounded wildcard collections (List<?> wildcard = new ArrayList<String>()), you can't add anything at all to the list but null (from Oracle docs):
Since we don't know what the element type of c stands for, we cannot add objects to it. The add() method takes arguments of type E, the element type of the collection. When the actual type parameter is ?, it stands for some unknown type. Any parameter we pass to add would have to be a subtype of this unknown type. Since we don't know what type that is, we cannot pass anything in. The sole exception is null, which is a member of every type.

A Collection<?> screams:
Please don't add anything to me. I have a strict content type, ... well uh, I just forgot what type it is.
While a Collection says:
It's all cool ! You can add whatever you like, I have no restrictions.
So, why shouldn't the compiler translate Collection to Collection<?> ?
Because it would put up a lot of restrictions.

A use-case that I can think of as to why Collection is not considered as Collection<?> is let say we have a instance of ArrayList
Now if the instance is of type ArrayList<Integer> or ArrayList<Double> or ArrayList<String>, you can add that type only(type checking). ArrayList<?> is not equivalent to ArrayList<Object>.
But with only ArrayList, you can add object of any type. This may be one of the reason why compiler is not considering ArrayList as ArrayList<?> (type checking).
One more reason could be backward compatibility with Java version that didn't have generics.

Java Generics and Raw Types

I have the next code:
ArrayList value = new ArrayList<Integer>(); // 1
value.add("Test"); // 2
I'm trying to understand line 2. Although I can see that value.add("Test"); compiles without errors, I can't see the reason it doesn't throw a runtime exception. If value is referencing a generic ArrayList object, why Java allows to add a String to it? Can anyone explain it to me?
The closest explanation I've found about this is described here, but I still don't understand the core reason:
Stack s = new Stack<Integer>()
This is a legal conversion from a parameterized type to a raw type. You will be able to push value of any type. However, any such operation will result in an "unchecked call" warning.

Generic types are erased during compilation. So at runtime, an ArrayList is a raw ArrayList, no matter if you defined it as generic or not.
In your case, the code compiles as your ArrayList declaration is not generic, and it runs fine because of type erasure.

ArrayList value this is your type declaration which is not generic. That is why compiler allows you to add any Object to the list.

Java, Generics: What's the difference between Set<?> s = HashSet<String>() and Set s = HashSet<String>()? [duplicate]

This question already has answers here:
Difference between List, List<?>, List<T>, List<E>, and List<Object>
(10 answers)
Closed 9 years ago.
I was reading about unknown types and raw types in generics, and this question came to mind. In other words, is...
Set<?> s = new HashSet<String>();
and
Set s = new HashSet<String>();
... one and the same?
I tried it out, and they both seem to accomplish the same thing, but I would like to know if they are any different to the compiler.

No, they are not the same. Here's the basic difference:
Set<?> s = HashSet<String>();
s.add(2); // This is invalid
Set s = HashSet<String>();
s.add(2); // This is valid.
The point is, the first one is a unbounded parameterized type Set. Compiler will perform the check there, and since you can't add anything but null to such types, compiler will give you an error.
While the second one being a raw type, the compiler won't do any check while adding anything to it. Basically, you lose the type safety here.
And you can see the result of loosing type safety there. Adding 2 to the set will fail at compile time for Set<?>, but for a raw type Set, it will be successfully added, but it might throw exception at runtime, when you get the element from the set, and assign it to say String.
Differences apart, you should avoid using raw types in newer code. You would rarely find any places where you would use it. Few places where you use raw type is to access static fields of that type, or getting Class object for that type - you can do Set.class, but not Set<?>.class.

The first one create a Set<?>, which means: "a generic Set of some unknown class". You won't be able to add anything to this set (except null) because the compiler doesn't know what its generic type is.
The second creates a raw, non generic set, and you can add anything you want to it. It doesn't provide any type-safety.
I don't see why you would use any of them. Set<String> should be the declared type.

The first one uses generics and the second one uses the raw form of Set.
The first one uses a wildcard as the generic type parameter. It means, "a Set of some specific yet unknown type", so you won't be call methods such as add that take a generic parameter, because the compiler doesn't know which specific type it really is. It maintains type safety by disallowing such a call at compile time.
The raw form removes all generics and provides no strong typing. You can add anything to such a Set, even non-Strings, which makes the following code not type-safe:
Set<String> genericSet = new HashSet<String>();
Set rawSet = genericSet;
rawSet.add(1); // That's not a String!
// Runtime error here.
for (String s : genericSet)
{
// Do something here
}
This would result in a runtime ClassCastException when the Integer 1 is retrieved and a String is expected.
Maintaining as much generic type information as possible is the way to go.
Set<String> s = HashSet<String>();

Set<?> tells the compiler that the set contains a specific type, but the type is unknown. The compiler uses this information to provide errors when you attempt to invoke a method with a generic parameter, like add(T).
Set tells the compiler that the set is a "raw" type, where no generic type parameter is given. The compiler will raise warnings, rather than errors, when the object's generic methods are invoked.
In order to add elements to the set without warnings, you need to specify the generic type information on the variable. The compiler can infer the type parameters for the constructor. Like this:
Set<String> s = new HashSet<>();
This information allows the compiler to verify that the Set is used in a type safe way. If your code compiles without type safety warnings, and you don't use any explicit casts, you can be assured that there will be no ClassCastException raised at runtime. If you use generics, but ignore type safety warnings, you might see a ClassCastException thrown at a point where you don't have a cast in your source code.

Using UNSAFE in collections

I notice that in Java 7 ,the collection classes(ConcurrentLinkedQueue in my case) use UNSAFE class for swap and find operations.
The offset seems to be calculated on the compile time declaration:
itemOffset = UNSAFE.objectFieldOffset(local.getDeclaredField("item"));
How would this work in a scenario where we do not have the exact parametrized type at compile time e.g when we try to insert an apple in to a method having Collection<? super Apple> in the declaration.
Does it use 'Apple' as the declared class to calculate offset?
Would appreciate any help in understanding the way UNSAFE works to calculate offsets here.

Jave doesn't allow us to use primitive types as type parameters of generics, only reference types are allowed. Reference types are stored as references that always have the same size, so that internal representation of objects of certain generic class is always the same, no matter how they're parameterized.
Therefore exact type of collection's items doesn't matter, because item is a reference that always has the same size.

How does new LinkedList<>() differ from new LinkedList()

I just stumbled upon the compiler treating these two terms differently. when I type:
LinkedList<String> list = new LinkedList();
I get a compiler warning about a raw type. however:
LinkedList<String> list = new LinkedList<>();
removes the warning. It seems to me as though the two statements mean essentially the same thing (i.e. create a new LinkedList with no specified object type). Why then does the complier all ow the empty generics? What is the difference here?

The statements do not mean the same thing at all.
The first statement tries to fit an untyped LinkedList into a declared generic LinkedList<String> and appropriately throws a warning.
The second statement, valid in Java 1.7 onward, uses type inference to guess the type parameter by using the declaring type's type parameter. In addition, sometimes this can be used in method calls. It doesn't always work, however.
See this page for more info.

It's the diamond operator in Java 7, that helps you save writing the type again. In Java 7 this is equivalent to the same generic type argument that is used on the left side of the declaration. So the initialization is type safe and no warning is issued.

With LinkedList<>, you use the new Diamond Operator, from java 7.
The Diamod operator uses the generic value setted in the left side of the line.
In Java 6, this doesnt works!
The diamond operator, however, allows the right hand side of the
assignment to be defined as a true generic instance with the same type
parameters as the left side... without having to type those parameters
again. It allows you to keep the safety of generics with almost the
same effort as using the raw type.
I think the key thing to understand is that raw types (with no <>)
cannot be treated the same as generic types. When you declare a raw
type, you get none of the benefits and type checking of generics. You
also have to keep in mind that generics are a general purpose part of
the Java language... they don't just apply to the no-arg constructors
of Collections!
Extracted from: https://stackoverflow.com/a/10093701/1281306

Backword compatibility (Inter-operating with legacy code) is the reason why java allows above signature. Generics are compile time syntax only. At runtime "all generic" syntax will be removed. You will just see if you de-compile any class file. Read this documentation.
LinkedList list = new LinkedList();

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Generic and the erasure process - java

the formal type parameter is not replaced by Object? Generic type represented as Object in runtime. But you can get information about <YourType> with reflection. Erasure relates to compatibility with old clases. It was a bad idea. Article about it.

Related

Why is Collection not simply treated as Collection<?>

Java Generics and Raw Types

Java, Generics: What's the difference between Set<?> s = HashSet<String>() and Set s = HashSet<String>()? [duplicate]

Using UNSAFE in collections

How does new LinkedList<>() differ from new LinkedList()

Categories

Resources