Why Java doesn't support structures ? (Just out of curiosity) - java

I know you can use public fields, or some other workarounds. Or maybe you don't need them at all. But just out of curiosity why Sun leave structures out.

Here's a link that explains Sun's decision:
2.2.2 No More Structures or Unions
Java has no structures or unions as complex data types. You don't need structures and unions when you have classes; you can achieve the same effect simply by declaring a class with the appropriate instance variables.

Although Java can support arbitrarily many kinds of classes, the Runtime only supports a few variable types: int, long, float, double, and reference; additionally, the Runtime only recognizes a few object types: byte[], char[], short[], int[], long[], float[], double[], reference[], and non-array object. The system will record a class type for each reference variable or array instance, and the Runtime will perform certain checks like ensuring that a reference stored into an array is compatible with the array type, but such behaviors merely regard the types of objects as "data".
I disagree with the claim that the existence of classes eliminates the need for structures, since structures have semantics which are fundamentally different from class objects. On the other hand, from a Runtime-system design perspective, adding structures greatly complicates the type system. In the absence of structures, the type system only needs eight array types. Adding structures to the type system would require the type system to recognize an arbitrary number of distinct variable types and array types. Such recognition is useful, but Sun felt that it wasn't worth the complexity.
Given the constraints under which Java's Runtime and type system operate, I personally think it should have included a limited form of aggregate type. Much of this would be handled by the language compiler, but it would need a couple of features in the Runtime to really work well. Given a declaration
aggregate TimedNamedPoint
{ int x,y; long startTime; String name; }
a field declaration like TimedNamedPoint tpt; would create four variables: tpt.x, tpt.y of type int, tpt.startTime of type long, and tpt.name of type String. Declaring a parameter of that type would behave similarly.
For such types to be useful, the Runtime would need a couple of slight additions: it would be necessary to allow functions to leave multiple values on the stack when they return, rather than simply having a single return value of one the five main types. Additionally, it would be necessary to have a means of storing multiple kinds of things in an array. While that could be accomplished by having the creation of something declared as TimedNamedPoint[12] actually be an Object[4] which would be initialized to identify two instances of int[12], a long[12], and a String[12], it would be better to have a means by which code could construct a single array instance could hold 24 values of type int, 12 of type long, and 12 of type String.
Personally, I think that for things like Point, the semantics of a simple aggregate would be much cleaner than for a class. Further, the lack of aggregates often makes it impractical to have a method that can return more than one kind of information simultaneously. There are many situations where it would be possible to have a method simultaneously compute and report the sine and cosine of a passed-in angle with much less work than would be required to compute both separately, but having to constructing a SineAndCosineResult object instance would negate any speed advantage that could have been gained by doing so. The execution model wouldn't need to change much to allow a method to leave two floating-point values on the evaluation stack when it returns, but at present no such thing is supported.

Related

In the absence of mutable types, is there a case for invariant type parameters?

Java Arrays are not fully type-safe because they are covariant: ArrayStoreException can occur on an aliased array. Java Collections, on the other hand, are invariant in their type parameter: e.g., List<Thread> is not a subtype of List<Runnable> (which may be somewhat counterintuitive).
The motivation seems to do with Lists and other collections being mutable, so to keep the type system sane, their type parameters necessarily have to be invariant.
If a programming language only supported immutable types, could a type system where type parameters were either covariant or contravariant (but never invariant) work? In other words, to use Scala's way of expressing variance, one would have List[+E], Function[-T, +R], Map[+K, +V], etc.
I know that there are some older languages (e.g., GNU Sather) that seem to get away with supporting just co-/contravariant parameter types.
My general question is: in a world of completely immutable data types, is there a case where one would specifically need an invariant parameter type (as opposed to either co- or contravariant)? Are there some examples for immutable data structures that would only be correct with an invariant type parameter?
So, every type system either allows some unsound programs or forbids some sound programs or both (this is a consequence of Rice's theorem), so a good working assumption is that yes, any stricture you come up with is bound to rule out some sound programs that would otherwise have been allowed. On the other hand, humans are infinitely clever, so in another sense the answer is no: if you add a stricture like you describe, that's OK, people will figure out a way around it when they need to. (Of course, sometimes the workaround they'll come up with will be one you don't like, such as abandoning your language.)
But I think what you're really asking for is a convincing case: a realistic example where, given the choice between supporting that example straightforwardly and sticking with your proposal to require all type parameters to be either covariant or contravariant, your gut will tell you to abandon the proposal so you can support that example straightforwardly.
Since you've already identified various cases where a type parameter can't be covariant and various cases where a type parameter can't be contravariant (for example, Function[-T, +R] is fine, but the reverse would be totally unsound), a good approach is to search for cases where the same type parameter is used twice, once in a way that can't be covariant and once in a way that can't be contravariant. A trivial example would be UnaryOperator[T] <: Function[T, T], analogous to Java's java.util.function.UnaryOperator<T>, whose 'apply' method returns the same type as it accepts. A UnaryOperator[String] can't be used as a UnaryOperator[Object] (because you can't pass it an arbitrary Object), but a UnaryOperator[Object] can't be used as a UnaryOperator[String], either (because even if you pass it a String, it might return some different Object).
For a more fleshed-out realistic example . . . imagine a binary search tree TreeMap[K, +V] <: Map[K, V], analogous to Java's java.util.TreeMap<K,V>. Presumably we want to support methods such as 'firstKey' and 'floorEntry' and 'iterator' and so on (or at least, some of them), so we can't make K contravariant: a TreeMap[Object, Foo] can't be used as a TreeMap[String, Foo], because when we retrieve a key, the key might not be a String.
And since it's a binary search tree, it needs a Comparator[K] internally, which immediately makes it tricky for K to be covariant: if you use a TreeMap[String, Foo] as a TreeMap[Object, Foo], then you're implicitly using a Comparator[String] as a Comparator[Object], which doesn't work. Now, since the map certainly only contains String keys, perhaps the 'get' method can work around this by pre-checking the type of the key before calling using Comparator[String]; but the 'floorEntry' and 'ceilingEntry' methods are still a problem: what entry comes "before" or "after" an arbitrary object that can't be compared to the keys in the map?
And even though you've said that your map is immutable, you probably still want some sort of 'put' method, just, a purely functional one that returns a modified copy of the map. (Purely functional red black trees support the same invariants and worst-case asymptotic time complexities as mutable ones, so type system aside, this is certainly a reasonable thing to do.) But if a TreeMap[String, Foo] can be used as a TreeMap[Object, Foo], then its 'put' method needs to support returning a binary search tree that contains a non-String key — even though its Comparator[String] doesn't define an ordering for such keys.
(In a comment, you mention that Scala actually defines Map[K, +V] with an invariant key type. I've never used Scala, but I bet that this is exactly why.)

Why are arrays Objects, but can not be used as a base class?

The Java language specification specifies that
In the Java programming language arrays are objects (§4.3.1), are dynamically created, and may be assigned to variables of type Object (§4.3.2). All methods of class Object may be invoked on an array.
So, considering arrays are objects — why did the Java designers make the decision not to allow inherit and override from it, for example, toString() or equals()?
The current syntax wouldn't allow creating anonymous classes with an array as the base class, but I don't think that was the reason for their decision.
Java was a compromise between non-object languages and very slow languages of that time where everything was an object (think about Smalltalk).
Even in more recent languages, having a fast structure at the language level for arrays (and usually maps) is considered a strategic goal. Most people wouldn't like the weight of an inheritable object for arrays, and certainly nobody wanted this before JVM advances like JIT.
That's why arrays, while being objects, weren't designed as class instances ("An object is a class instance or an array"). There would be little benefit in having the ability to override a method on an array, and certainly not a great-enough one to counterbalance the need to check for the right method to apply (and in my opinion not a great-one enough to counterbalance the increased difficulty in code reading, similar to what happens when you override operators).
I came across UNDER THE HOOD - Objects and arrays which explains almost anything you need to know about how JVM handles arrays. In JVM, arrays are handled with special bytecodes, not like other objects we are familiar with.
In the JVM instruction set, all objects are instantiated and accessed
with the same set of opcodes, except for arrays. In Java, arrays are
full-fledged objects, and, like any other object in a Java program,
are created dynamically. Array references can be used anywhere a
reference to type Object is called for, and any method of Object can
be invoked on an array. Yet, in the Java virtual machine, arrays are
handled with special bytecodes.
As with any other object, arrays cannot be declared as local
variables; only array references can. Array objects themselves always
contain either an array of primitive types or an array of object
references. If you declare an array of objects, you get an array of
object references. The objects themselves must be explicitly created
with new and assigned to the elements of the array.
Arrays are dynamically created objects, and they serve as a container that hold a (constant) number of objects of the same type. It looks like arrays are not like any other object, and that's why they are treated differently.
I'd like to point out this article. It seems as though arrays and objects follow different opcodes. I can't honestly summarize it more than that however it seems, arrays are simply not treated as Objects like we're normally used to so they don't inherit Object methods.
Full credits to the author of that post as it's a very interesting read, both short & detailed.
Upon further digging into the topic via multiple sources I've decided to give a more elaborate version of my previous answer.
The first thing to note that instantiation of Objects and Arrays are very different within the JVM, their follow their respective bytecode.
Object:
Object instantiation follows a simple Opcode new which is a combination of two operands - indexbyte1 & indexbyte2. Once instantiated the JVM pushes the reference to this object onto the stack. This occurs for all objects irrespective of their types.
Arrays:
Array Opcodes (regarding instantiation of an array) however are divided into three different codes.
newarray - pops length, allocates new array of primitive types of type indicated by atype, pushes objectref of new array
newarray opcode is used when creating arrays that involve primitive datatypes (byte short char int long float double boolean) rather than object references.
anewarray - pops length, allocates a new array of objects of class indicated by indexbyte1 and indexbyte2, pushes objectref of new array
anewarray opcode is used when creating arrays of object references
multianewarray - pops dimensions number of array lengths, allocates a new multidimensional array of class indicated by indexbyte1 and indexbyte2, pushes objectref of new array
multianewarray instruction is used when allocating multi-dimensional arrays
Object can be a class instance or an array.
Take from Oracle Docs
A class instance is explicitly created by a class instance creation expression
BUT
An array is explicitly created by an array creation expression
This goes hand in hand with the information regarding the opcodes. Arrays are simply not developed to be class interfaces but are instead explicitly created by array creation expression thus naturally wouldn't implicitly be able to inherit and/or override Object.
As we have seen, it has nothing to do with the fact that arrays may hold primitive datatypes. After giving it some thought though, it isn't very common to come across situations where one might want to toString() or equals() however was still a very interesting question to try and answer.
Resources:
Oracle-Docs chapter 4.3.1
Oracle-Docs chapter 15.10.1
Artima - UnderTheHood
There are many classes in standard java library that you cannot subclass, arrays aren't the only example. Consider String, or StringBuffer, or any of the "primitive wrappers", like Integer, or Double.
There are optimizations that JVM does based on knowing the exact structure of these objects when it deals with them (like unboxing the primitives, or manipulating array memory at byte level). If you could override anything, it would not be possible, and affect the performance of the programs very badly.

Java: What's the proper term for multidimensional types, one-dimensional types?

In a Java context, what's the proper way to refer to a variable type that can contain multiple objects or primitives, and the proper way to refer to a type that can contain just one?
In general terms, I refer to lists, arrays, vectors, hashtables, trees, etc, as collections; and I refer to primitive types and one-dimensional objects as scalars.
In the wild, I've heard all sorts of combinations of phrases, including a few that are outright misleading:
"I'm storing my key/value pairs in a hashtable vector."
"Why would you need more than one hashtable?"
"What do you mean? I'm only using one hashtable."
Is there a widely-accepted way to refer to these two groupings of types, at a high level?
The Java Language Spec uses the terms "primitive" and "reference" to make that distinction for both variables and values. This might be confusing in the context of a different programming language where "reference" means something else.
However, I can't tell if that is exactly the distinction you're trying to make. If you want to lump strings and object wrappers like Integer in with the Java primitive types like int you might be distinguishing scaler from non-scaler. Not all non-scalers are collections, of course.
I'm not sure that kind of terminology really applies to an OO language like Java. There's a distinction made between primitives, which can only contain a single value, and Objects, however an Object might contain any number of other objects.
Objects whose purpose is to contain zero-or-more instances of some other object are (in my experience) referred to as collections, maybe because of the Collections API or Arrays.
I guess languages where you're dealing explicitly with pointers and the like depend on the distinction more.
I think your question assumes that there are only 2 classes and one of them is what you call Collections. Let me say that I also call lists, maps, sets, etc. Collections because they're part of the Collections API. However, Collection is not on the same abstraction level as a primitive data type like integer. Really, you have primitives and references. References are pointers to objects which are instances of classes. Classes can be classified many ways. One of these classifications is "Collections".
Your friend who says "hashtable vector" is pretty much wrong if there's only one table. A hash table is a hash table, and a vector is a vector. A hashtable vector is a Vector<Hashtable> as far as I'm concerned.
Just use the most specific Java API class name (Collection, List, Object, etc.), without over-specifying, and 99% of all Java developers will understand what you mean.

Performance of the Number class

I'm just wondering about the performance of the Number class as opposed to, say using generics or even a whole lot of functions to handle primitive types.
The primitive types would clearly be the fastest option I would assume, however if the performance hit is not too huge, it would likely be easier for the coder to just use the Number class or generics rather than making a function that accepts and returns long, double (etc).
I am about to do a performance benchmark of the 3 options mentioned. Is there anything I should be aware of/try out when doing this, or even better, has someone done this before that they can give me results to?
Typically you use the Number class as opposed to primitive types because you need to use these values in collections or other classes that are based on Objects. If you are not restricted by this requirement, then you should use primitives.
Yes, there is a performance hit associated with using the Number class, in comparison with primitive types like int, long, etc. Especially if you are creating a lot of new Numbers, you will want to worry about the performance when compared with creating primitive types. But this is not necessarily the case for passing Numbers to methods. Passing an instance of Number to a method is no slower than passing an int or a long, since the compiler can basically pass a "pointer" to a memory location. This is very general information because your question is very general.
One thing you should be aware is that object allocation is likely to be the largest cost when you use Numbers. This affects your benchmarks as certain operations which use auto-boxing can use cached values (which don't create objects) and can give you much better performance results. e.g. if you use Integers between -128 and 127, you will get much better results than Doubles from -128 to 127 because the former uses caches values, the later does not.
In short, if you are micro-benchmarking the use of Numbers, you need to ensure the range of values you use are realistic, not all values are equal in terms of performance (of course for primitives this doesn't matter so much)

In Java, why are arrays objects? Are there any specific reasons?

Is there any reason why an array in Java is an object?
Because the Java Language Specification says so :)
In the Java programming language arrays are objects (§4.3.1), are dynamically created, and may be assigned to variables of type Object (§4.3.2). All methods of class Object may be invoked on an array.
So, unlike C++, Java provides true arrays as first-class objects:
There is a length member.
There is a clone() method which overrides the method of the same name in class Object.
Plus all the members of the class Object.
An exception is thrown if you attempt to access an array out of bounds.
Arrays are instanciated in dynamic memory.
Having arrays be objects means that you can do operations with them (e.g., someArray.count('foo')) instead of just doing it against them (e.g., count(someArray, 'foo')), which leads to more natural syntax.
Another point is that objects are mutable and are passed by reference. In arrays there aren't any fields/methods that you can use to change "properties" of the array, but you sure can mutate the element values. And the benefits of passing arrays by reference are pretty obvious (though functional programmers probably wish Java had immutable lists passed by value).
Edit: forgot to mention. In the period before autoboxing, it was helpful to be able to store arrays in collections, write them to ObjectStreams etc.
Probably because they wanted to get as close as possible to making everything an object. Native types are there for backward compatibility.
So that they get all the benefits thereof:
getHashCode()
toString()
etc.
And arrays aren't 'primitive', so if they can't be primitive, they must be objects.
I'm not sure about the official reason.
However, it makes sense to me that they are objects because operations can be performed on them (such as taking the length) and it made more sense to support these operations as member functions rather than introduce new keywords. Other operations include clone(), the inherited operations of object, etc. Arrays are also hashable and potentially comparable.
This is different from C (and native arrays in C++), where your arrays are essentially pointers to a memory offset.

Categories

Resources