Reading through the javase api docs, I noticed that pretty much all of the methods in the collections framework use angle brackets. For example:
Collection<String> c = new HashSet<String>();
or
Map<String, Integer> m = new HashMap<String, Integer>();
To the eye they seem to serve the same function as a set of parentheses. I still don't know enough of the Java language to be able to see an overarching connection where angle brackets are used and why that might be the case.
My question is specifically: Is there a significance to the way angle brackets are interpreted by the JVM as opposed to perens? Or is it just a common practice across multiple languages?
The angle brackets came with the introduction of generics in Java 1.5
Since this is a later addition to an existing language, I guess the angle brackets where chosen to make a clear distinction to the existing parentheses (method and constructor calls), square brackets (array member access) and curly brackets (block delimiters). I'd say angle brackets are the logical choice here.
I guess they are used in Java because they are used in C++, just like everything from int to void.
Found some interesting references, though partial:
From C++ templates: the complete guide By David Vandevoorde, Nicolai M. Josuttis, page 139:
Relatively early during the development of templates, Tom Pennello—a
widely recognized parsing expert working for Metaware—noted some of
the problems associated with angle brackets. Stroustrup also comments
on that topic in [DnE] and argues that humans prefer to read angle
brackets rather than parentheses. However, other possibilities exist,
and Pennello specifically proposed braces (for example. List{: :X}) at
a C++ standards meeting in 1991 (held in Dallas) At that time the
extent of the problem was more limited because templates nested inside
other templates—so-called nested templates —were not valid and thus
the discussion of Section 9.3.3 on page 132 was largely irrelevant. As
a result. the committee declined the proposal to replace the angle
brackets.
So I may have been mistaken that the angled brackets were used to help the parser, perhaps they were used to help the programmer, because Bjarne Stroustrup thought they were better.
Parentheses are already reserved for method calls and expression grouping. Angle brackets are used for generic type parameters.
If parentheses were used for both, things could become ambiguous, if not for the compiler, then at least for the reader.
Is there a significance to the way angle brackets are interpreted by
the JVM as opposed to perens?
None of them is interpreted by the JVM [neither the braces, nor angle brackets], both parentheses and angle brackets are parsed during compile time, and the JVM doesn't see them, since the JVM is active on run time.
As side notes:
The <> are used for generics, and their usage is also common in other languages such as C++.
You are referring to new HashSet<String>(); as a method - it is not, it is invoking a constructor. A constructor is not a method.
In Java, angle brackets indicate the use of a generic type, which is a type that has different semantics depending on the type passed in.
The simplest use case for generics is in specialized collections, to indicate that they should hold only objects of a particular type.
In Java, generics do not actually add a lot to the language except basic enforcement functionality at run-time. Objects inserted into or retrieved from a collection are automatically cast to the given type at run time (risking a ClassCastException in the worst case). There is no compile-time checking for generic types in the language specification.
Angle brackets are used to denote type parameter lists for polymorphic ("generic") classes and methods. These are a very different beast from value parameter lists (the stuff in parentheses). If they were the same, then imagine you have the expression new Foo(bar)... How would the parser interpret this? Is bar the name of a type or the name of a variable?
Imagine that C++ used () instead of <> for templates. Now consider this line of code:
foo(bar)*bang;
Is that:
Declaring a local variable bang whose type is a pointer to the template type foo with type argument bar?
Calling the function foo, passing in bar, then multiplying the result by bang?
It's grammatically ambiguous. We could tweak the grammar such that it would always prefer one over the other, but that makes the (already painfully complex) grammar even hairier. Worse, whichever way you pick, users will probably guess wrong sometimes.
So, for C++, it makes sense to use a different grouping character for templates.
Java then just followed in C++'s footsteps.
Most of the trouble here stem's from C's decision to not have explicit syntax for variable declaration and instead just use a type annotation to implicitly mean "make a new variable of that type". Languages like Scala which have explicit keywords (var and val) for variables have more freedom with type declaration syntax, which is why they can use [] for generics.
Related
If Java allowed "instanceof" as a name for variables (and fields, type names, package names), it appears, at a first glance, that the language would still remain unambiguous.
In most or all of the productions in Java where an Identifier can appear, there are contextual cues that would prevent confusion with a binary operator.
Regarding the basic production:
RelationalExpression:
...
RelationalExpression instanceof ReferenceType
There are no expressions of the form RelationalExpression Identifier ReferenceType, since appending a single Identifier to any Expression is never valid, and no ReferenceType can be extended by adding an Identifier on the front.
The only other reason I can think of why instanceof must be a keyword would be if there were some other production containing an Identifier which can be broken up into an instanceof expression. That is, there may be productions which are ambiguous if we allow instanceof as an Identifier. However, I can't seem to find any, since an Identifier is almost always separated from its surrounding tokens by a dot (or is identifiable as a MethodName by a following lparen).
Is instanceof a keyword simply out of tradition, rather than necessity? Could new relational operators be introduced in future Java versions, with tokens that collide with identifiers? (For example, could a hypothetical "relatedto" operator be introduced without making it a keyword, which would break existing code?)
That question is different, it's asking why "instanceof" isn't a method, I'm asking whether there are reasons syntactically why
You have a point in that it could have been a method on Object or we have
if (myClass.class.isInstance(obj))
This is more cumbersome, however I would say that chains of instanceof are not considered best practice and making it a little harder might not have been a bad idea.
It is worth noting that earlier version of Java didn't use intrinsics as much as they do now and using a method would have been far less efficient than a native keyword, though I don't believe that would have to be true today.
Is instanceof a keyword simply out of tradition, rather than necessity?
IMHO keywords were/are considered good practice to make words with special meaning stand out as having and only having a special purpose.
Could new relational operators be introduced in future Java versions, with tokens that collide with identifiers?
Yes, One of the proposals for adding val and var is that they be special types, rather than keywords to avoid conflicting with code which have used them for variable names.
Given a chose, a new language would make these keywords and it is only for backward compatibility that they might be other wise. Alternatively it has been considered to use final rather than val and transient rather than var.
Personally, I think they should add them how other languages do it for consistency otherwise you are going to have every new Java developer asking basic questions like How do I compare strings in Java? What they did made sense but it confuses just about every new developer.
By comparison, they banned making _ a lambda variable to avoid confusion with other languages where this has a special meaning and they have a warning about using _ as a variable that it might be removed in future versions.
In most programing languages that I know you cannot declare a variable with name that is also a key word.
For example in Java:
public class SomeClass
{
Class<?> clazz = Integer.class; // OK.
Class<?> class = Integer.class; // Compilation error.
}
But it's very easy to figure out what is what. Humans reading it will not confuse variable name with class declaration and compiler will most likely not confuse it too.
Same thing about variable names like 'for', 'extends', 'goto' or anything from Java key words if we are talking about Java programming language.
What is the reason that we have this limitation?
What is the reason that we have this limitation?
There are two reasons in general:
As you identified in your Question: it would be extremely confusing for the human reader. And a programming language that is confusing by design is not going to get significant traction as a practical programming language.
If identifiers can be the same as keywords, it makes it much more difficult to write a formal grammar for the language. (Certainly, a grammar like that with the rules for disambiguation cannot be expressed in BNF / EBNF or similar.) That means that writing a parser for such a language would be a lot more complicated.
Anyhow, while neither of these reasons is a total "show stopper", they would be sufficient to cause most people attempting a new programming language design / implementation to reject the idea.
And that of course is the real reason that you (almost) never see languages where keywords can be used as identifiers. Programming language designers nearly always reject the idea ...
(In the case of Java, there was a conscious effort to make the syntax accessible to people used to the C language. C doesn't support this. That would have been a 3rd reason ... if they were looking for one.)
There is one interesting (semi-) counter example in a mainstream programming language. In early versions of FORTRAN, spaces in identifiers were not significant. Thus
I J = 1
and
IJ = 1
meant the same thing. That is cool (depending on your "taste" ...). But compare these two:
DO 20 I = 10, 1, -2
versus
DO 20 I = 10
One is an assignment, but the other one is a "DO loop" statement. As a reader, would you notice this?
It allows the lexer to classify symbols without having to disambiguate context - this in turn allows the language to be parsed according to grammar rules without needing knowledge about other ("higher") parts of the compilation process, including analysis of types.
As an example of complications (and ambiguity) removing such a distinction adds to parsing, consider the following. Under standard Java rules it declares and assigns a variable - there is no ambiguity of how it will be parsed.
final Foo x = 2; // roughly: <keyword> <identifier> <identifier> = <value>
Now, in a hypothetical language without a strict keyword distinction, imagine the following, where final may be a declared type; there are now two possible readings. The first is when final is not a type and the standard reading exists:
final Foo = 2; // roughly: <keyword> <identifier> ?error? = <value>
But if final was a "final type", then the reading may be:
final Foo = 2; // hypothetical: <identifier> <identifier> = <value>
Which interpretation of the source is correct?
Java makes this question even harder to answer due to separate compilation. Should adding a new "final type" in (or accidentally importing) a namespace now change how the code is parsed? Reporting an unresolved symbol is one thing - changing how the grammar is parsed based on such resolution is another.
These sort of issues are simply bypassed with the clear distinction of reserved words.
Arguably, there could be special productions to change the recognition of keywords dynamically (some languages allow controllable operator precedence), but this is not done in mainstream languages and is most certainly not supported in Java. At the very least it requires additional syntax and adds complexity to the system for not-enough benefit.
The most "clean" approach I've seen to such a problem is in C#, which allows one prefix reserved words and remove special meaning such as class #class { float #int = 2; } - although such should be done rarely, and ick!
Now, some words in Java that are reserved could be "reserved only in context", such as extends. Such is seen in SQL all the time; there are reserved words (eg. OVER) and then words that only have special meaning in a given statement construct (eg. ROW_NUMBER). But it's easier to say reserved is reserved, go pick something else.
Except for a very simple-to-parse language like LISP dialects, which effectively treat every bareword as an identifier, keywords and the distinction from identifiers is very prevalent in language grammars.
You're not quite right there. A key word is a word that has meaning in the syntax of the language, and a reserved word is one that you're not allowed to use as an identifier. In Java mostly they are the same, but 'true' and 'goto' are reserved words and not key words ('true' is a literal and 'goto' is not used).
The main reason to make the key words in a language reserved words is to simplify parsing and avoid ambiguities. For example, what does this mean if return could be a method?
return(1);
In my opinion, Java has taken this too far. There are key words that are only meaningful in a particular context in which there could be no ambiguity. Perhaps there is benefit in avoiding confusion on the part of the reader, but I put it down to customary habit of compiler writers. There are other languages which have far fewer key words and/or reserved words and work just fine.
I'm reviewing the API changes for Java 8 and I noticed that the new methods in java.util.Arrays are not overloaded for all primitives. The methods I noticed are:
parallelSetAll
parallelPrefix
spliterator
stream
Currently these new methods only handle int, long, and double primitives.
int, long, and double are probably the most widely used primitives so it makes sense that if they had to limit the API that they would choose those three, but why did they have to limit the API?
To address the questions as a whole, and not just this particular scenario, I think we all want to know...
Why There's Interface Pollution in Java 8
For instance, in a language like C#, there is a set of predefined function types accepting any number of arguments with an optional return type (Func and Action each one going up to 16 parameters of different types T1, T2, T3, ..., T16), but in the JDK 8 what we have is a set of different functional interfaces, with different names and different method names, and whose abstract methods represent a subset of well known function arities (i.e. nullary, unary, binary, ternary, etc). And then we have an explosion of cases dealing with primitive types, and there are even other scenarios causing an explosion of more functional interfaces.
The Type Erasure Issue
So, in a way, both languages suffer from some form of interface pollution (or delegate pollution in C#). The only difference is that in C# they all have the same name. In Java, unfortunately, due to type erasure, there is no difference between Function<T1,T2> and Function<T1,T2,T3> or Function<T1,T2,T3,...Tn>, so evidently, we couldn't simply name them all the same way and we had to come up with creative names for all possible types of function combinations. For further reference on this, please refer to How we got the generics we have by Brian Goetz.
Don't think the expert group did not struggle with this problem. In the words of Brian Goetz in the lambda mailing list:
[...] As a single example, let's take function types. The lambda
strawman offered at devoxx had function types. I insisted we remove
them, and this made me unpopular. But my objection to function types
was not that I don't like function types -- I love function types --
but that function types fought badly with an existing aspect of the
Java type system, erasure. Erased function types are the worst of
both worlds. So we removed this from the design.
But I am unwilling to say "Java never will have function types"
(though I recognize that Java may never have function types.) I
believe that in order to get to function types, we have to first deal
with erasure. That may, or may not be possible. But in a world of
reified structural types, function types start to make a lot more
sense [...]
An advantage of this approach is that we can define our own interface types with methods accepting as many arguments as we would like, and we could use them to create lambda expressions and method references as we see fit. In other words, we have the power to pollute the world with yet even more new functional interfaces. Also, we can create lambda expressions even for interfaces in earlier versions of the JDK or for earlier versions of our own APIs that defined SAM types like these. And so now we have the power to use Runnable and Callable as functional interfaces.
However, these interfaces become more difficult to memorize since they all have different names and methods.
Still, I am one of those wondering why they didn't solve the problem as in Scala, defining interfaces like Function0, Function1, Function2, ..., FunctionN. Perhaps, the only argument I can come up with against that is that they wanted to maximize the possibilities of defining lambda expressions for interfaces in earlier versions of the APIs as mentioned before.
Lack of Value Types Issue
So, evidently type erasure is one driving force here. But if you are one of those wondering why we also need all these additional functional interfaces with similar names and method signatures and whose only difference is the use of a primitive type, then let me remind you that in Java we also lack of value types like those in a language like C#. This means that the generic types used in our generic classes can only be reference types and not primitive types.
In other words, we can't do this:
List<int> numbers = asList(1,2,3,4,5);
But we can indeed do this:
List<Integer> numbers = asList(1,2,3,4,5);
The second example, though, incurs in the cost of boxing and unboxing of the wrapped objects back and forth from/to primitive types. This can become really expensive in operations dealing with collections of primitive values. So, the expert group decided to create this explosion of interfaces to deal with the different scenarios. To make things "less worse" they decided to only deal with three basic types: int, long and double.
Quoting the words of Brian Goetz in the lambda mailing list:
[...] More generally: the philosophy behind having specialized
primitive streams (e.g., IntStream) is fraught with nasty tradeoffs.
On the one hand, it's lots of ugly code duplication, interface
pollution, etc. On the other hand, any kind of arithmetic on boxed ops
sucks, and having no story for reducing over ints would be terrible.
So we're in a tough corner, and we're trying to not make it worse.
Trick #1 for not making it worse is: we're not doing all eight
primitive types. We're doing int, long, and double; all the others
could be simulated by these. Arguably we could get rid of int too, but
we don't think most Java developers are ready for that. Yes, there
will be calls for Character, and the answer is "stick it in an int."
(Each specialization is projected to ~100K to the JRE footprint.)
Trick #2 is: we're using primitive streams to expose things that are
best done in the primitive domain (sorting, reduction) but not trying
to duplicate everything you can do in the boxed domain. For example,
there's no IntStream.into(), as Aleksey points out. (If there were,
the next question(s) would be "Where is IntCollection? IntArrayList?
IntConcurrentSkipListMap?) The intention is many streams may start as
reference streams and end up as primitive streams, but not vice versa.
That's OK, and that reduces the number of conversions needed (e.g., no
overload of map for int -> T, no specialization of Function for int
-> T, etc.) [...]
We can see that this was a difficult decision for the expert group. I think few would agree that this is elegant, but most of us would most likely agree it was necessary.
For further reference on the subject you may want to read The State of Value Types by John Rose, Brian Goetz, and Guy Steele.
The Checked Exceptions Issue
There was a third driving force that could have made things even worse, and it is the fact that Java supports two types of exceptions: checked and unchecked. The compiler requires that we handle or explicitly declare checked exceptions, but it requires nothing for unchecked ones. So, this creates an interesting problem, because the method signatures of most of the functional interfaces do not declare to throw any exceptions. So, for instance, this is not possible:
Writer out = new StringWriter();
Consumer<String> printer = s -> out.write(s); //oops! compiler error
It cannot be done because the write operation throws a checked exception (i.e. IOException) but the signature of the Consumer method does not declare it throws any exception at all. So, the only solution to this problem would have been to create even more interfaces, some declaring exceptions and some not (or come up with yet another mechanism at the language level for exception transparency. Again, to make things "less worse" the expert group decided to do nothing in this case.
In the words of Brian Goetz in the lambda mailing list:
[...] Yes, you'd have to provide your own exceptional SAMs. But then
lambda conversion would work fine with them.
The EG discussed additional language and library support for this
problem, and in the end felt that this was a bad cost/benefit
tradeoff.
Library-based solutions cause a 2x explosion in SAM types (exceptional
vs not), which interact badly with existing combinatorial explosions
for primitive specialization.
The available language-based solutions were losers from a
complexity/value tradeoff. Though there are some alternative
solutions we are going to continue to explore -- though clearly not
for 8 and probably not for 9 either.
In the meantime, you have the tools to do what you want. I get that
you prefer we provide that last mile for you (and, secondarily, your
request is really a thinly-veiled request for "why don't you just give
up on checked exceptions already"), but I think the current state lets
you get your job done. [...]
So, it's up to us, the developers, to craft yet even more interface explosions to deal with these in a case-by-case basis:
interface IOConsumer<T> {
void accept(T t) throws IOException;
}
static<T> Consumer<T> exceptionWrappingBlock(IOConsumer<T> b) {
return e -> {
try { b.accept(e); }
catch (Exception ex) { throw new RuntimeException(ex); }
};
}
In order to do:
Writer out = new StringWriter();
Consumer<String> printer = exceptionWrappingBlock(s -> out.write(s));
Probably, in the future when we get Support for Value Types in Java and Reification, we will be able to get rid of (or at least no longer need to use anymore) some of these multiple interfaces.
In summary, we can see that the expert group struggled with several design issues. The need, requirement or constraint to keep backward compatibility made things difficult, then we have other important conditions like the lack of value types, type erasure and checked exceptions. If Java had the first and lacked the other two the design of JDK 8 would probably have been different. So, we all must understand that these were difficult problems with lots of tradeoffs and the EG had to draw a line somewhere and make decisions.
I watched the Oracle OTN Virtual Event: Java SE and JavaFX 2.0 (28 Feb 2012) and while talking about the new diamond operator (that Map<String, List<String>> myMap = new HashMap<>(); thing) the speaker mentioned that it was not as simpleto implement than one might think, as it is not a simple token replacement.
My question is why? Why can't be this implemented as simply taking the string from the variable's declaration and put it into the diamond operator?
I didn't implement it either, so I can only guess.
But usually the reason these things are more complex than they seem is that first inspection only looks at the most common (or most publicized) use case. In this case it's the one you mentioned. In theory that should be easy to specify exactly and it should be rather easy to implement in a compiler.
However, the diamond operator (which is not technically a operator, by the way) can be used in different ways as well:
someMethodWithGenericArguments(new HashMap<>());
new SomeGenericClass(new HashMap<>());
T foo = new SomethingRelatedToT<>(); // where T is a generic type parameter
In those cases a simple token replacement obviously no longer works, you need actual type inference involving real type analysis (i.e. it's on an entirely different abstraction level as a simple token replacement would be).
Something which Java doesn't do (which many languages have) is implied types based on usage. i.e. Java doesn't imply a require type based on how it is used.
e.g.
Type a = b;
The type of a and the type of b are independent and no assumptions are made about b based on the type of a.
MethodHandles are showing signs of supporting this. The return type use can be based on context, but this is a runtime feature.
In conclusion, my assumption is; It was hard to implement in Java because the language didn't support any like it. If the language used feature like this all the time, the approach to take would be understood (in term of defining a spec of how it should work) and supported by the tools in the compiler.
I'm looking at some Java code that are maintained by other parts of the company, incidentally some former C and C++ devs. One thing that is ubiquitous is the use of static integer constants, such as
class Engine {
private static int ENGINE_IDLE = 0;
private static int ENGINE_COLLECTING = 1;
...
}
Besides a lacking 'final' qualifier, I'm a bit bothered by this kind of code. What I would have liked to see, being trained primarily in Java from school, would be something more like
class Engine {
private enum State { Idle, Collecting };
...
}
However, the arguments fail me. Why, if at all, is the latter better than the former?
Why, if at all, is the latter better
than the former?
It is much better because it gives you type safety and is self-documenting. With integer constants, you have to look at the API doc to find out what values are valid, and nothing prevents you from using invalid values (or, perhaps worse, integer constants that are completely unrelated). With Enums, the method signature tells you directly what values are valid (IDE autocompletion will work) and it's impossible to use an invalid value.
The "integer constant enums" pattern is unfortunately very common, even in the Java Standard API (and widely copied from there) because Java did not have Enums prior to Java 5.
An excerpt from the official docs, http://java.sun.com/j2se/1.5.0/docs/guide/language/enums.html:
This pattern has many problems, such as:
Not typesafe - Since a season is just an int you can pass in any other int value where a season is required, or add two seasons together (which makes no sense).
No namespace - You must prefix constants of an int enum with a string (in this case SEASON_) to avoid collisions with other int enum types.
Brittleness - Because int enums are compile-time constants, they are compiled into clients that use them. If a new constant is added between two existing constants or the order is changed, clients must be recompiled. If they are not, they will still run, but their behavior will be undefined.
Printed values are uninformative - Because they are just ints, if you print one out all you get is a number, which tells you nothing about what it represents, or even what type it is.
And this just about covers it. A one word argument would be that enums are just more readable and informative.
One more thing is that enums, like classes. can have fields and methods. This gives you the option to encompass some additional information about each type of state in the enum itself.
Because enums provide type safety. In the first case, you can pass any integer and if you use enum you are restricted to Idle and Collecting.
FYI : http://www.javapractices.com/topic/TopicAction.do?Id=1.
By using an int to refer to a constant, you're not forcing someone to actually use that constant. So, for example, you might have a method which takes an engine state, to which someone might happy invoke with:
engine.updateState(1);
Using an enum forces the user to stick with the explanatory label, so it is more legible.
There is one situation when static constance is preferred (rather that the code is legacy with tonne of dependency) and that is when the member of that value are not/may later not be finite.
Imagine if you may later add new state like Collected. The only way to do it with enum is to edit the original code which can be problem if the modification is done when there are already a lot of code manipulating it. Other than this, I personally see no reason why enum is not used.
Just my thought.
Readabiliy - When you use enums and do State.Idle, the reader immediately knows that you are talking about an idle state. Compare this with 4 or 5.
Type Safety - When use enum, even by mistake the user cannot pass a wrong value, as compiler will force him to use one of the pre-declared values in the enum. In case of simple integers, he could even pass -3274.
Maintainability - If you wanted to add a new state Waiting, then it would be very easy to add new state by adding a constant Waiting in your enum State without casuing any confusion.
The reasons from the spec, which Lajcik quotes, are explained in more detail in Josh Bloch's Effective Java, Item 30. If you have access to that book, I'd recommend perusing it. Java Enums are full-fledged classes which is why you get compile-time type safety. You can also give them behavior, giving you better encapsulation.
The former is common in code that started pre-1.5. Actually, another common idiom was to define your constants in an interface, because they didn't have any code.
Enums also give you a great deal of flexibility. Since Enums are essentially classes, you can augment them with useful methods (such as providing an internationalized resource string corresponding to a certain value in the enumeration, converting back and forth between instances of the enum type and other representations that may be required, etc.)