For various reasons I'm stuck with this bit of Java code that uses Scala types:
scala.Tuple2<scala.Enumeration.Value, Integer>[] tokens = new scala.Tuple2[] {
new scala.Tuple2(scala.math.BigDecimal.RoundingMode.UP(), 0)
};
IntelliJ throws this warning on line 1:
Unchecked assignment: 'scala.Tuple2[]' to 'scala.Tuple2<scala.Enumeration.Value,java.lang.Integer>[]'
And it throws two warnings on line 2:
Raw use of parameterized class 'scala.Tuple2'
Unchecked call to 'Tuple2(T1, T2)' as a member of raw type 'scala.Tuple2'
I can get rid of the warnings on line 2 by simply adding <> after new scala.Tuple2 and before (:
scala.Tuple2<scala.Enumeration.Value, Integer>[] tokens = new scala.Tuple2[] {
new scala.Tuple2<>(scala.math.BigDecimal.RoundingMode.UP(), 0)
};
But the warning on line 1 remains. Adding <> after new scala.Tuple2 and before [] doesn't help. I also tried this:
scala.Tuple2<scala.Enumeration.Value, Integer>[] tokens = new scala.Tuple2<scala.Enumeration.Value, Integer>[] {
new scala.Tuple2<>(scala.math.BigDecimal.RoundingMode.UP(), 0)
};
This causes an error: Generic array creation. I don't understand what this means or why it wouldn't work.
Generics are entirely a compile time thing. The stuff in the <> either doesn't end up in class files at all, or if it does, it is, as far as the JVM is concerned, a comment. It has no idea what any of it means. The only reason <> survives is purely for javac's needs: It needs to know that e.g. the signature of the List interface is boolean add(E), even though as far as the JVM is concerned, it's just boolean add(Object).
As a consequence, given an instance of some list, e.g.:
// snippet 1:
List<?> something = foo();
List<String> foo() {
return new ArrayList<String>();
}
// snippet 2:
List<?> something = foo();
List<Integer> foo() {
return new ArrayList<String>();
}
These are bytecode wise identical, at least as far as the JVM is concerned. There's this one weird comment thing the JVM doesn't know about that is ever so slightly different, is all. The runtime structure of the object created here is identical and hence it is simply not possible to call anything on the something variable to determine if it is a list of strings or a list of integers.
But, array types are a runtime thing. You can figure it out:
// snippet 1:
Object[] something = foo();
String[] foo() {
return new String[0];
}
// snippet 2:
Object[] something = foo();
Integer[] foo() {
return new Integer[0];
}
Here, you can tell the difference: something.getClass().getComponentType() will be String.class in snippet 1, and Integer.class in snippet 2.
Generics are 100% a compile time thing. If javac (or scalac, or whatever compiler you are using) doesn't stop you, then the runtime never will. You can trivially 'break' the heap if you insist on doing this:
List<String> strings = new ArrayList<String>();
List /* raw */ raw = strings; // warning, but, compiles
raw.add(Integer.valueOf(5));
String a = strings.get(0); // uhoh!
The above compiles fine. The only reason it crashes at runtime is because a ClassCastException occurs, but you can avoid that with more shenanigans if you must.
In contrast to arrays where all this is a runtime thing:
Object[] a = new String[10];
a[0] = Integer.valueOf(5);
The above compiles. At runtime you get an ArrayStoreException.
Thus, generics and arrays are like fire and water. Mutually exclusive; at opposite ends of a spectrum. Do not play together, at all.
Now we get to the construct new T[]. This doesn't even compile. Because javac doesn't know what T is going to be, but arrays know the component type, and it is not possible to derive T at runtime, so this creation isn't possible.
In other words, mixing arrays and generics is going to fail, in the sense that generics are entirely a compile time affair, and tossing arrays into the mix means the compiler can no longer do the job of ensuring you don't get 'heap corruption' (the notion that there's an integer in a list that a variable of type List<String> is pointing at).
You simply write this:
List<String>[] arr = new List[10];
And yes, the compiler will warn you that it has no way of ensuring that arr will in fact only contain this; you get an 'this code uses unchecked/unsafe operations' warning. But, key word, warning. You can ignore them with #SuppressWarnings.
There's no way to get rid of this otherwise: Mixing arrays and generics usually ends up there (in warnings that you have to suppress).
Related
Code below run on my hotspot JVM and I got "a" as output.
ArrayList<Method> list = new ArrayList<>();
Method method = list.getClass().getDeclaredMethod("add", Object.class);
method.invoke(list, "a");
System.out.println(list.get(0));
But a ClassCastException occured after running the below code:
ArrayList<String> list1 = new ArrayList<>();
Method method1 = list1.getClass().getDeclaredMethod("add", Object.class);
method1.invoke(list1, 1); // or replaced with method1.invoke(list1, new int[]{1});
System.out.println(list1.get(0));
What's wrong with the second code?
Is ArrayList<String> any special?
There's nothing special about ArrayList<String>. It's simply how generics and method invocation expressions interact in the language. What you are observing is as a result of there being overloads of PrintStream.println for both Object and String parameters.
The TL;DR: is that the former case invokes PrintStream.println(Object); the latter cast invokes PrintStream.println(String), for which the compiler inserts a cast because the String is coming from a list expected to contain Strings only.
Generics are simply compiler-inserted casts. When the compiler sees that a method returns a E (e.g. ArrayList<E>::get(int)), it thinks that the result of that method can be safely cast to E (because it is an E, a subclass of E, or null).
In order to use that result as an E, though, it has to cast the result to E, because the result of get is an Object, because of type erasure.
So, when you write things like:
List<String> list = ...
String s = list.get(0);
list.get(0).toString();
System.out.println("" + list.get(0));
the compiler will insert casts, and so the code which is actually executed looks like:
List<String> list = ...
String s = (String) list.get(0);
((String) list.get(0)).toString();
System.out.println("" + (String) list.get(0));
which is harder to read; but you don't need the explicit casts, because the compiler knows to insert them for you, based on the fact that list is a List<String>.
When you invoke a method, like System.out.println, the compiler goes through quite a complicated process to determine which method to invoke.
In this case, it looks at the PrintStream class, and finds all of the overloads of the method println; then it narrows these down to the ones which could be invoked for the given arguments; then it picks which of the potential matches is most specific.
Again, "most specific" is rather complicated, but it is summarised as a method is more specific if any valid parameters can also be passed to a less specific method, but not vice versa.
class Foo {
static void foo(String str) {}
static void foo(Object str) {}
}
So, foo(String) is more specific than foo(Object), because all Strings are Object, but not all Objects are Strings.
So, when you're invoking foo(something) and something is expected to be a String, foo(String) is invoked, even though foo(Object) could also be invoked; if it's any other kind of object, foo(Object) is invoked, because not-Strings can't be passed to foo(String).
Enough theory, let's look at this specific example:
ArrayList<Method> list = new ArrayList<>();
// ...
System.out.println(list.get(0));
The overload of PrintStream.println which is most specific for this invocation is PrintStream.println(Object). The raw list.get(0) call returns an Object, so no cast needs to be inserted by the compiler to make it compatible.
ArrayList<String> list = new ArrayList<>();
// ...
System.out.println(list.get(0));
Hence, there is no problem if list.get(0) returns something that isn't a Method.
The overload of PrintStream.println which is most specific for this invocation is PrintStream.println(String). The raw list.get(0) call returns an Object, so a cast to String needs to be inserted by the compiler to make it compatible.
In fact, the code executed is effectively:
System.out.println((String) list.get(0));
Hence, there is problem if list.get(0) returns something that isn't a String: you will get a ClassCastException, as you found.
The important thing to point out here is that this happens because of what the compiler expects the types to be, because of the type information it has at its disposal. These are reasonable expectations that are safe if you haven't done type-unsafe things (like adding to the list reflectively); but the protections offered by the compiler are somewhat trivial to work around.
For this reason, you should pay careful attention to ensure that what you are doing really is type-safe still, even when you are working behind the safety guard of the compiler.
Well, there were many question on this site regarding raw types and generics in Java. Even questions regarding why does the next line of code comes up with a warning:
List<String> list = new ArrayList();
And answered many times, since ArrayList() is of raw type, therefore the compiler raises a warning since now list is not "type safe", and the option to write this line of code is solely for backward compatibility.
What I don't understand, and didn't find questions about it, is why? Since the compiler compiles the Java code by "looking" only on the static references, how come there is a difference in compilation time for writing new ArrayList(); instead of new ArrayList<>();.
For example, writing this code:
List<String> list = new ArrayList(); // 1
list.add("A string"); // 2
list.add(new Object()); // 3
results in a compilation warning in line 1, no compilation problem in line 2, but a compilation error in line 3 - of type safety.
Therefore - adding a generic reference to the first line (new ArrayList<>();), results only in the removal of the compiler warning.
I understand it's a bad habit to use raw types, but my question is really what is the difference (except for the compilation warning) in writing the right hand side as a raw type.
Thanks!
The compiler does not care what mechanism created the object that your variable list refers to. In fact, it could also refer to null. Or it could be a call to a method. Example:
void yourMethod() {
List<String> list = createStringList();
...
}
List<String> createStringList() {
return new ArrayList(); // raw type here
}
When having a proper typed variable (that was not declared with a raw type) all usages of this variable are checked against the generic type.
Another thing would be if your variable itself is declared with a raw type: Example:
List list = new ArrayList();
list.add("A string");
list.add(new Object());
This compiles fine, but the warning should alert you because things may break later!
Suppose you have another class where the constructor parameters depend on the type parameter:
class Foo<T> {
Foo(T obj) { }
}
Then the compiler checks the parameter type when you create it with a type parameter or diamond operator:
Foo<String> bar = new Foo<>(42); // doesn't compile
But raw types turns off generics checking:
Foo<String> bar = new Foo(42); // does compile but causes heap pollution
so a warning is necessary.
I was reading about varargs heap pollution and I don't really get how varargs or non-reifiable types would be responsible for problems that do not already exist without genericity. Indeed, I can very easily replace
public static void faultyMethod(List<String>... l) {
Object[] objectArray = l; // Valid
objectArray[0] = Arrays.asList(42);
String s = l[0].get(0); // ClassCastException thrown here
}
with
public static void faultyMethod(String... l) {
Object[] objectArray = l; // Valid
objectArray[0] = 42; // ArrayStoreException thrown here
String s = l[0];
}
The second one simply uses the covariance of arrays, which is really the problem here. (Even if List<String> was reifiable, I guess it would still be a subclass of Object and I would still be able to assign any object to the array.) Of course I can see there's a little difference between the two, but this code is faulty whether it uses generics or not.
What do they mean by heap pollution (it makes me think about memory usage but the only problem they talk about is potential type unsafetiness), and how is it different from any type violation using arrays' covariance?
You're right that the common (and fundamental) problem is with the covariance of arrays. But of those two examples you gave, the first is more dangerous, because can modify your data structures and put them into a state that will break much later on.
Consider if your first example hadn't triggered the ClassCastException:
public static void faultyMethod(List<String>... l) {
Object[] objectArray = l; // Valid
objectArray[0] = Arrays.asList(42); // Also valid
}
And here's how somebody uses it:
List<String> firstList = Arrays.asList("hello", "world");
List<String> secondList = Arrays.asList("hello", "dolly");
faultyMethod(firstList, secondList);
return secondList.isEmpty()
? firstList
: secondList;
So now we have a List<String> that actually contains an Integer, and it's floating around, safely. At some point later — possibly much later, and if it's serialized, possibly much later and in a different JVM — someone finally executes String s = theList.get(0). This failure is so far distant from what caused it that it could be very difficult to track down.
Note that the ClassCastException's stack trace doesn't tell us where the error really happened; it just tells us who triggered it. In other words, it doesn't give us much information about how to fix the bug; and that's what makes it a bigger deal than an ArrayStoreException.
The difference between an array and a List is that the array checks it's references. e.g.
Object[] array = new String[1];
array[0] = new Integer(1); // fails at runtime.
however
List list = new ArrayList<String>();
list.add(new Integer(1)); // doesn't fail.
From the linked document, I believe what Oracle means by "heap pollution" is to have data values that are technically allowed by the JVM specification, but are disallowed by the rules for generics in the Java programming language.
To give you an example, let's say we define a simple List container like this:
class List<E> {
Object[] values;
int len = 0;
List() { values = new Object[10]; }
void add(E obj) { values[len++] = obj; }
E get(int i) { return (E)values[i]; }
}
This is an example of code that is generic and safe:
List<String> lst = new List<String>();
lst.add("abc");
This is an example of code that uses raw types (bypassing generics) but still respects type safety at a semantic level, because the value we added has a compatible type:
String x = (String)lst.values[0];
The twist - now here is code that works with raw types and does something bad, causing "heap pollution":
lst.values[lst.len++] = new Integer("3");
The code above works because the array is of type Object[], which can store an Integer. Now when we try to retrieve the value, it'll cause a ClassCastException - at retrieval time (which is way after the corruption occurred), instead of at add time:
String y = lst.get(1); // ClassCastException for Integer(3) -> String
Note that the ClassCastException happens in our current stack frame, not even in List.get(), because the cast in List.get() is a no-op at run time due to Java's type erasure system.
Basically, we inserted an Integer into a List<String> by bypassing generics. Then when we tried to get() an element, the list object failed to uphold its promise that it must return a String (or null).
Prior to generics, there was absolutely no possibility that an object's runtime type is inconsistent with its static type. This is obviously a very desirable property.
We can cast an object to an incorrect runtime type, but the cast would fail immediately, at the exact site of casting; the error stops there.
Object obj = "string";
((Integer)obj).intValue();
// we are not gonna get an Integer object
With the introduction of generics, along with type erasure (the root of all evils), now it is possible that a method returns String at compile time, yet returns Integer at runtime. This is messed up. And we should do everything we can to stop it from the source. It is why the compiler is so vocal about every sight of unchecked casts.
The worst thing about heap pollution is that the runtime behavior is undefined! Different compiler/runtime may execute the program in different ways. See case1 and case2.
They are different because ClassCastException and ArrayStoreException are different.
Generics compile-time type checking rules should ensure that it's impossible to get a ClassCastException in a place where you didn't put an explicit cast, unless your code (or some code you called or called you) did something unsafe at compile-time, in which case you should (or whatever code did the unsafe thing should) receive a compile-time warning about it.
ArrayStoreException, on the other hand, is a normal part of how arrays work in Java, and pre-dates Generics. It is not possible for compile-time type checking to prevent ArrayStoreException because of the way the type system for arrays is designed in Java.
Well, I have read a lot of answers to this question, but I have a more specific one. Take the following snippet of code as an example.
public class GenericArray<E>{
E[] s= new E[5];
}
After type erasure, it becomes
public class GenericArray{
Object[] s= new Object[5];
}
This snippet of code seems to work well. Why does it cause a compile-time error?
In addition, I have known from other answers that the following codes work well for the same purpose.
public class GenericArray<E>{
E[] s= (E[])new Object[5];
}
I've read some comments saying that the piece of code above is unsafe, but why is it unsafe? Could anyone provide me with a specific example where the above piece of code causes an error?
In addition, the following code is wrong as well. But why? It seems to work well after erasure, too.
public class GenericArray<E>{
E s= new E();
}
Array declarations are required to have a reifiable type, and generics are not reifiable.
From the documentation: the only type you can place on an array is one that is reifiable, that is:
It refers to a non-generic class or interface type declaration.
It is a parameterized type in which all type arguments are unbounded wildcards (§4.5.1).
It is a raw type (§4.8).
It is a primitive type (§4.2).
It is an array type (§10.1) whose element type is reifiable.
It is a nested type where, for each type T separated by a ".", T itself is reifiable.
This means that the only legal declaration for a "generic" array would be something like List<?>[] elements = new ArrayList[10];. But that's definitely not a generic array, it's an array of List of unknown type.
The main reason that Java is complaining about the you performing the cast to E[] is because it's an unchecked cast. That is, you're going from a checked type explicitly to an unchecked one; in this case, a checked generic type E to an unchecked type Object. However, this is the only way to create an array that is generic, and is generally considered safe if you have to use arrays.
In general, the advice to avoid a scenario like that is to use generic collections where and when you can.
This snippet of code seems to work well. Why does it cause a compile-time error?
First, because it would violate type safety (i.e. it is unsafe - see below), and in general code that can be statically determined to do this is not allowed to compile.
Remember that, due to type erasure, the type E is not known at run-time. The expression new E[10] could at best create an array of the erased type, in this case Object, rendering your original statement:
E[] s= new E[5];
Equivalent to:
E[] s= new Object[5];
Which is certainly not legal. For instance:
String[] s = new Object[10];
... is not compilable, for basically the same reason.
You argued that after erasure, the statement would be legal, implying that you think this means that the original statement should also be considered legal. However this is not right, as can be shown with another simple example:
ArrayList<String> l = new ArrayList<Object>();
The erasure of the above would be ArrayList l = new ArrayList();, which is legal, while the original is clearly not.
Coming at it from a more philosophical angle, type erasure is not supposed to change the semantics of the code, but it would do so in this case - the array created would be an array of Object rather than an array of E (whatever E might be). Storing a non-E object reference in it would then be possible, whereas if the array were really an E[], it should instead generate an ArrayStoreException.
why is it unsafe?
(Bearing in mind we are now talking about the case where E[] s= new E[5]; has been replaced with E[] s = (E[]) new Object[5];)
It is unsafe (which in this instance is short for type unsafe) because it creates at run-time a situation in which a variable (s) holds a reference to an object instance which is not a sub-type of the variable's declared type (Object[] is not a subtype of E[], unless E==Object).
Could anyone provide me with a specific example where the above piece of code causes an error?
The essential problem is that it is possible to put non-E objects into an array that you create by performing a cast (as in (E[]) new Object[5]). For example, say there is a method foo which takes an Object[] parameter, defined as:
void foo(Object [] oa) {
oa[0] = new Object();
}
Then take the following code:
String [] sa = new String[5];
foo(sa);
String s = sa[0]; // If this line was reached, s would
// definitely refer to a String (though
// with the given definition of foo, this
// line won't be reached...)
The array definitely contains String objects even after the call to foo. On the other hand:
E[] ea = (E[]) new Object[5];
foo(ea);
E e = ea[0]; // e may now refer to a non-E object!
The foo method might have inserted a non-E object into the array. So even though the third line looks safe, the first (unsafe) line has violated the constraints that guarantee that safety.
A full example:
class Foo<E>
{
void foo(Object [] oa) {
oa[0] = new Object();
}
public E get() {
E[] ea = (E[]) new Object[5];
foo(ea);
return ea[0]; // returns the wrong type
}
}
class Other
{
public void callMe() {
Foo<String> f = new Foo<>();
String s = f.get(); // ClassCastException on *this* line
}
}
The code generates a ClassCastException when run, and it is not safe. Code without unsafe operations such as casts, on the other hand, cannot produce this type of error.
In addition, the following code is wrong as well. But why? It seems to work well after erasure, too.
The code in question:
public class GenericArray<E>{
E s= new E();
}
After erasure, this would be:
Object s = new Object();
While this line itself would be fine, to treat the lines as being the same would introduce the semantic change and safety issue that I have described above, which is why the compiler won't accept it. As an example of why it could cause a problem:
public <E> E getAnE() {
return new E();
}
... because after type erasure, 'new E()' would become 'new Object()' and returning a non-E object from the method clearly violates its type constraints (it is supposed to return an E) and is therefore unsafe. If the above method were to compile, and you called it with:
String s = <String>getAnE();
... then you would get a type error at runtime, since you would be attempting to assign an Object to a String variable.
Further notes / clarification:
Unsafe (which is short for "type unsafe") means that it could potentially cause a run-time type error in code that would otherwise be sound. (It actually means more than this, but this definition is enough for purposes of this answer).
it's possible to cause a ClassCastException or ArrayStoreException or other exceptions with "safe" code, but these exceptions only occur at well defined points. That is, you can normally only get a ClassCastException when you perform a cast, an operation that inherently carries this risk. Similarly, you can only get an ArrayStoreException when you store a value into an array.
the compiler doesn't verify that such an error will actually occur before it complains that an operation is unsafe. It just knows that that certain operations are potentially able to cause problems, and warns about these cases.
that you can't create a new instance of (or an array of) a type parameter is both a language feature designed to preserve safety and probably also to reflect the implementation restrictions posed by the use of type erasure. That is, new E() might be expected to produce an instance of the actual type parameter, when in fact it could only produce an instance of the erased type. To allow it to compile would be unsafe and potentially confusing. In general you can use E in place of an actual type with no ill effect, but that is not the case for instantiation.
A compiler can use a variable of type Object to do anything a variable of type Cat can do. The compiler may have to add a typecast, but such typecast will either throw an exception or yield a reference to an instance of Cat. Because of this, the generated code for a SomeCollection<T> doesn't have to actually use any variables of type T; the compiler can replace T with Object and cast things like function return values to T where necessary.
A compiler cannot use an Object[], however, to do everything a Cat[] can do. If a SomeCollection[] had an array of type T[], it would not be able to create an instance of that array type without knowing the type of T. It could create an instance of Object[] and store references to instances of T in it without knowing the type of T, but any attempt to cast such an array to T[] would be guaranteed to fail unless T happened to be Object.
Let's say generic arrays are allowed in Java. Now, take a look at following code,
Object[] myStrs = new Object[2];
myStrs[0] = 100; // This is fine
myStrs[1] = "hi"; // Ambiguity! Hence Error.
If user is allowed to create generic Array, then user can do as I've shown in above code and it will confuse compiler. It defeats the purpose of arrays (Arrays can handle only same/similar/homogeneous type of elements, remember?). You can always use array of class/struct if you want heterogeneous array.
More info here.
it is syntactically legal to do this:
String [] s = new String[1];
Object [] o = s;
o[0] = new Integer(42);
but of course it will crash at runtime.
my question is: what is the point of allowing this assignment in the first place?
The problem is the assignment Object [] o = s; - I assume that's what you mean by "this".
The technical term is array covariance, and without it, you could not have code that deals with arrays generically. For example, most of the non-primitive-array methods in java.util.Arrays would be useless as you could only use them with actual Object[] instances. Obviously, this was considered more important by the designers of Java than complete type safety.
There is an alternative solution, which you see when looking at Java's generics introduced in Java 5: explicit covariance via wildcards. However, that results in considerable added complexity (see the constant stream of questions about the ? wildcard), and Java's original designers wanted to avoid complexity.
The point of allowing that kind of assignment is that disallowing it would make things like the following impossible:
ArrayList[] lists = new ArrayList[10];
lists[0] = new ArrayList();
List[] genericLists = lists;
lists[0].add("someObject");
If the compiler forbids your String -> Object case, then it would also have to forbid ArrayList -> List and any other instance of assigning from a subclass to one of its superclass types. That kind of makes a lot of the features of an object-oriented language such as Java useless. Of course, it's much more typical to do things like:
List[] lists = new List[10];
lists[0] = new ArrayList();
lists[0].add("someObject");
But regardless, the compiler can't filter out these cases without simultaneously disallowing many useful and legitimate use-cases, so it's up to the programmer to make sure that what they are doing is sane. If what you want is an Object[], then declare your variable as such. If you declare something as String[], cast it to an Object[], and then forget that what you really have is a String[], then that is simply programmer error.
It's not allowed, you get a java.lang.ArrayStoreException: java.lang.Integer when you run that code
The compiler allows it, because you're casting a String[] to Object[], which is correct. This is similar to
Integer i = new Integer(10);
Object o = i;
String s = (String) o;
The compiler doesn't complain, but you get a ClassCastExeption at runtime.
The compiler can't know (without static analysis) that o is actually a String[] and so allows the assignment even though it fails at runtime. There's nothing preventing a e.g. Integer[] to be assigned to o, it just doesn't happen.
The assignment must be allowed because the compiler cannot infer if the array will hold (or will not hold) only strings at runtime. It throws an [ArrayStoreException][1] which is checked at runtime.
Consider this:
String [] s = new String[1];
Object [] o = s;
o = new Integer[1];
o[0] = new Integer(1);
This situation is valid and runs OK. To give a more broader perspective, IMHO Arrays are a low-level leaky abstraction in Java.
Generally the compiler can't tell if o has been assigned as String[]. Consider this:
String[] s = new String[1];
Object[] o;
if (complexFunction(System.currentTimeMillis())) {
o = s;
} else {
o = new Integer[1];
}
o[0] = 42;
The compiler won't know at design time what type o is going to take - so it just allows the assignment.