Anjelika Langer in her Generics FAQ writes the following regarding the decision for Java to go with code reuse instead of code specialization for generic types :
Code specialization is particularly wasteful in cases where the
elements in a collection are references (or pointers), because all
references (or pointers) are of the same size and internally have the
same representation. There is no need for generation of mostly
identical code for a list of references to integers and a list of
references to strings. Both lists could internally be represented by
a list of references to any type of object. The compiler just has to
add a couple of casts whenever these references are passed in and out
of the generic type or method. Since in Java most types are reference
types, it deems natural that Java chooses code sharing as its
technique for translation of generic types and methods.
So first question is it true that all references are of the same size and internally share the same representation?
If the answer is true that what properties does all the references in Java share?
So first question is it true that all references are of the same size and internally share the same representation?
Yes. (Why would you imagine that someone as knowledgeable as Anjelika Langer would get that wrong???)
If the answer is true that what properties does all the references in Java share?
They all have a type that has Object as its ultimate supertype. Hence they all provide all methods in the java.lang.Object API.
Related
The Java language specification specifies that
In the Java programming language arrays are objects (§4.3.1), are dynamically created, and may be assigned to variables of type Object (§4.3.2). All methods of class Object may be invoked on an array.
So, considering arrays are objects — why did the Java designers make the decision not to allow inherit and override from it, for example, toString() or equals()?
The current syntax wouldn't allow creating anonymous classes with an array as the base class, but I don't think that was the reason for their decision.
Java was a compromise between non-object languages and very slow languages of that time where everything was an object (think about Smalltalk).
Even in more recent languages, having a fast structure at the language level for arrays (and usually maps) is considered a strategic goal. Most people wouldn't like the weight of an inheritable object for arrays, and certainly nobody wanted this before JVM advances like JIT.
That's why arrays, while being objects, weren't designed as class instances ("An object is a class instance or an array"). There would be little benefit in having the ability to override a method on an array, and certainly not a great-enough one to counterbalance the need to check for the right method to apply (and in my opinion not a great-one enough to counterbalance the increased difficulty in code reading, similar to what happens when you override operators).
I came across UNDER THE HOOD - Objects and arrays which explains almost anything you need to know about how JVM handles arrays. In JVM, arrays are handled with special bytecodes, not like other objects we are familiar with.
In the JVM instruction set, all objects are instantiated and accessed
with the same set of opcodes, except for arrays. In Java, arrays are
full-fledged objects, and, like any other object in a Java program,
are created dynamically. Array references can be used anywhere a
reference to type Object is called for, and any method of Object can
be invoked on an array. Yet, in the Java virtual machine, arrays are
handled with special bytecodes.
As with any other object, arrays cannot be declared as local
variables; only array references can. Array objects themselves always
contain either an array of primitive types or an array of object
references. If you declare an array of objects, you get an array of
object references. The objects themselves must be explicitly created
with new and assigned to the elements of the array.
Arrays are dynamically created objects, and they serve as a container that hold a (constant) number of objects of the same type. It looks like arrays are not like any other object, and that's why they are treated differently.
I'd like to point out this article. It seems as though arrays and objects follow different opcodes. I can't honestly summarize it more than that however it seems, arrays are simply not treated as Objects like we're normally used to so they don't inherit Object methods.
Full credits to the author of that post as it's a very interesting read, both short & detailed.
Upon further digging into the topic via multiple sources I've decided to give a more elaborate version of my previous answer.
The first thing to note that instantiation of Objects and Arrays are very different within the JVM, their follow their respective bytecode.
Object:
Object instantiation follows a simple Opcode new which is a combination of two operands - indexbyte1 & indexbyte2. Once instantiated the JVM pushes the reference to this object onto the stack. This occurs for all objects irrespective of their types.
Arrays:
Array Opcodes (regarding instantiation of an array) however are divided into three different codes.
newarray - pops length, allocates new array of primitive types of type indicated by atype, pushes objectref of new array
newarray opcode is used when creating arrays that involve primitive datatypes (byte short char int long float double boolean) rather than object references.
anewarray - pops length, allocates a new array of objects of class indicated by indexbyte1 and indexbyte2, pushes objectref of new array
anewarray opcode is used when creating arrays of object references
multianewarray - pops dimensions number of array lengths, allocates a new multidimensional array of class indicated by indexbyte1 and indexbyte2, pushes objectref of new array
multianewarray instruction is used when allocating multi-dimensional arrays
Object can be a class instance or an array.
Take from Oracle Docs
A class instance is explicitly created by a class instance creation expression
BUT
An array is explicitly created by an array creation expression
This goes hand in hand with the information regarding the opcodes. Arrays are simply not developed to be class interfaces but are instead explicitly created by array creation expression thus naturally wouldn't implicitly be able to inherit and/or override Object.
As we have seen, it has nothing to do with the fact that arrays may hold primitive datatypes. After giving it some thought though, it isn't very common to come across situations where one might want to toString() or equals() however was still a very interesting question to try and answer.
Resources:
Oracle-Docs chapter 4.3.1
Oracle-Docs chapter 15.10.1
Artima - UnderTheHood
There are many classes in standard java library that you cannot subclass, arrays aren't the only example. Consider String, or StringBuffer, or any of the "primitive wrappers", like Integer, or Double.
There are optimizations that JVM does based on knowing the exact structure of these objects when it deals with them (like unboxing the primitives, or manipulating array memory at byte level). If you could override anything, it would not be possible, and affect the performance of the programs very badly.
I know you can use public fields, or some other workarounds. Or maybe you don't need them at all. But just out of curiosity why Sun leave structures out.
Here's a link that explains Sun's decision:
2.2.2 No More Structures or Unions
Java has no structures or unions as complex data types. You don't need structures and unions when you have classes; you can achieve the same effect simply by declaring a class with the appropriate instance variables.
Although Java can support arbitrarily many kinds of classes, the Runtime only supports a few variable types: int, long, float, double, and reference; additionally, the Runtime only recognizes a few object types: byte[], char[], short[], int[], long[], float[], double[], reference[], and non-array object. The system will record a class type for each reference variable or array instance, and the Runtime will perform certain checks like ensuring that a reference stored into an array is compatible with the array type, but such behaviors merely regard the types of objects as "data".
I disagree with the claim that the existence of classes eliminates the need for structures, since structures have semantics which are fundamentally different from class objects. On the other hand, from a Runtime-system design perspective, adding structures greatly complicates the type system. In the absence of structures, the type system only needs eight array types. Adding structures to the type system would require the type system to recognize an arbitrary number of distinct variable types and array types. Such recognition is useful, but Sun felt that it wasn't worth the complexity.
Given the constraints under which Java's Runtime and type system operate, I personally think it should have included a limited form of aggregate type. Much of this would be handled by the language compiler, but it would need a couple of features in the Runtime to really work well. Given a declaration
aggregate TimedNamedPoint
{ int x,y; long startTime; String name; }
a field declaration like TimedNamedPoint tpt; would create four variables: tpt.x, tpt.y of type int, tpt.startTime of type long, and tpt.name of type String. Declaring a parameter of that type would behave similarly.
For such types to be useful, the Runtime would need a couple of slight additions: it would be necessary to allow functions to leave multiple values on the stack when they return, rather than simply having a single return value of one the five main types. Additionally, it would be necessary to have a means of storing multiple kinds of things in an array. While that could be accomplished by having the creation of something declared as TimedNamedPoint[12] actually be an Object[4] which would be initialized to identify two instances of int[12], a long[12], and a String[12], it would be better to have a means by which code could construct a single array instance could hold 24 values of type int, 12 of type long, and 12 of type String.
Personally, I think that for things like Point, the semantics of a simple aggregate would be much cleaner than for a class. Further, the lack of aggregates often makes it impractical to have a method that can return more than one kind of information simultaneously. There are many situations where it would be possible to have a method simultaneously compute and report the sine and cosine of a passed-in angle with much less work than would be required to compute both separately, but having to constructing a SineAndCosineResult object instance would negate any speed advantage that could have been gained by doing so. The execution model wouldn't need to change much to allow a method to leave two floating-point values on the evaluation stack when it returns, but at present no such thing is supported.
I have a map: Map abc = new HashMap(). Why is it mandatory for me to use only objects as keys and not primitives?
As for why: The implementations of a Map require Object keys (with an equals() function) to (efficiently) order/store your values for quick retrieval. Primitives do not have an equals() function and are therefore unsuited for the task. (this is basically what #MadProgrammer suggested, except that equals is used in the defintion, and hashCode is just optional for possible implementations).
There is no reason that it would not be possible to programm this, however: in fact you could argue that primitives have the easiest equality and hashCodes to compute! This is probably what is done in TIntArrayList as suggested by Narendra Pathai. And as Jens Schauder states: it would not be worth the hassle, also because autoboxing hides the problem from you most of the time.
In java there is a big divide between primitives and objects/classes.
When you define a method that takes and Object as an argument, you might as well pass a String, or a AbstractSingletonFactoryFacade. But you can't pass a primitive. There is just no way to abstract over multiple primitives. This didn't change with generics.
What one could do is define separate interfaces accepting (and returning) the various primitives. While this would be feasible for thing like List, which have only one type parameter, for Map with two type parameters you would end up with 81 interfaces (8 primitive types + Object squared). Which just isn't worth the hassle.
Of course most of the time this doesn't matter since Autoboxing makes the problem invisible most of the time.
In a Java context, what's the proper way to refer to a variable type that can contain multiple objects or primitives, and the proper way to refer to a type that can contain just one?
In general terms, I refer to lists, arrays, vectors, hashtables, trees, etc, as collections; and I refer to primitive types and one-dimensional objects as scalars.
In the wild, I've heard all sorts of combinations of phrases, including a few that are outright misleading:
"I'm storing my key/value pairs in a hashtable vector."
"Why would you need more than one hashtable?"
"What do you mean? I'm only using one hashtable."
Is there a widely-accepted way to refer to these two groupings of types, at a high level?
The Java Language Spec uses the terms "primitive" and "reference" to make that distinction for both variables and values. This might be confusing in the context of a different programming language where "reference" means something else.
However, I can't tell if that is exactly the distinction you're trying to make. If you want to lump strings and object wrappers like Integer in with the Java primitive types like int you might be distinguishing scaler from non-scaler. Not all non-scalers are collections, of course.
I'm not sure that kind of terminology really applies to an OO language like Java. There's a distinction made between primitives, which can only contain a single value, and Objects, however an Object might contain any number of other objects.
Objects whose purpose is to contain zero-or-more instances of some other object are (in my experience) referred to as collections, maybe because of the Collections API or Arrays.
I guess languages where you're dealing explicitly with pointers and the like depend on the distinction more.
I think your question assumes that there are only 2 classes and one of them is what you call Collections. Let me say that I also call lists, maps, sets, etc. Collections because they're part of the Collections API. However, Collection is not on the same abstraction level as a primitive data type like integer. Really, you have primitives and references. References are pointers to objects which are instances of classes. Classes can be classified many ways. One of these classifications is "Collections".
Your friend who says "hashtable vector" is pretty much wrong if there's only one table. A hash table is a hash table, and a vector is a vector. A hashtable vector is a Vector<Hashtable> as far as I'm concerned.
Just use the most specific Java API class name (Collection, List, Object, etc.), without over-specifying, and 99% of all Java developers will understand what you mean.
Is there any reason why an array in Java is an object?
Because the Java Language Specification says so :)
In the Java programming language arrays are objects (§4.3.1), are dynamically created, and may be assigned to variables of type Object (§4.3.2). All methods of class Object may be invoked on an array.
So, unlike C++, Java provides true arrays as first-class objects:
There is a length member.
There is a clone() method which overrides the method of the same name in class Object.
Plus all the members of the class Object.
An exception is thrown if you attempt to access an array out of bounds.
Arrays are instanciated in dynamic memory.
Having arrays be objects means that you can do operations with them (e.g., someArray.count('foo')) instead of just doing it against them (e.g., count(someArray, 'foo')), which leads to more natural syntax.
Another point is that objects are mutable and are passed by reference. In arrays there aren't any fields/methods that you can use to change "properties" of the array, but you sure can mutate the element values. And the benefits of passing arrays by reference are pretty obvious (though functional programmers probably wish Java had immutable lists passed by value).
Edit: forgot to mention. In the period before autoboxing, it was helpful to be able to store arrays in collections, write them to ObjectStreams etc.
Probably because they wanted to get as close as possible to making everything an object. Native types are there for backward compatibility.
So that they get all the benefits thereof:
getHashCode()
toString()
etc.
And arrays aren't 'primitive', so if they can't be primitive, they must be objects.
I'm not sure about the official reason.
However, it makes sense to me that they are objects because operations can be performed on them (such as taking the length) and it made more sense to support these operations as member functions rather than introduce new keywords. Other operations include clone(), the inherited operations of object, etc. Arrays are also hashable and potentially comparable.
This is different from C (and native arrays in C++), where your arrays are essentially pointers to a memory offset.