I would like to understand the key difference between parametric polymorphism such as polymorphism of generic classes/functions in the Java/Scala/C++ languages and "ad-hoc" polymorphism in the Haskell type system. I'm familiar with the first kind of languages, but I have never worked with the Haskell.
More precisely:
How is type inference algorithm e.g. in Java different from the type inference in Haskell?
Please, give me an example of the situation where something can be written in Java/Scala but can not be written in Haskell(according to the modular features of these platforms too), and vice-versa.
Thanks in advance.
As per the TAPL, §23.2:
Parametric polymorphism (...), allows a single piece of
code to be typed “generically,” using variables in place of actual types, and
then instantiated with particular types as needed. Parametric definitions
are uniform: all of their instances behave the same. (...)
Ad-hoc polymorphism, by contrast, allows a polymorphic value to exhibit
different behaviors when “viewed” at different types. The most common
example of ad-hoc polymorphism is overloading, which associates a single
function symbol with many implementations; the compiler (or the runtime system, depending on whether overloading resolution is static or dynamic) chooses an appropriate implementation for each application of the
function, based on the types of the arguments.
So, if you consider successive stages of history, non-generic official Java (a.k.a pre-J2SE 5.0, bef. sept. 2004) had ad-hoc polymorphism - so you could overload a method - but not parametric polymorphism, so you couldn't write a generic method. Afterwards you could do both, of course.
By comparison, since its very beginning in 1990, Haskell was parametrically polymorphic, meaning you could write:
swap :: (A; B) -> (B; A)
swap (x; y) = (y; x)
where A and B are type variables can be instantiated to all types, without assumptions.
But there was no preexisting construct giving ad-hoc polymorphism, which intends to let you write functions that apply to several, but not all types. Type classes were implemented as a way of achieving this goal.
They let you describe a class (something akin to a Java interface), giving the type signature of the functions you want implemented for your generic type. Then you can register some (and hopefully, several) instances matching this class. In the meantime, you can write a generic method such as :
between :: (Ord a) a -> a -> a -> Bool
between x y z = x ≤ y ^ y ≤ z
where the Ord is the class that defines the function (_ ≤ _). When used, (between "abc" "d" "ghi") is resolved statically to select the right instance for strings (rather than e.g. integers) - exactly at the moment when (Java's) method overloading would.
You can do something similar in Java with bounded wildcards. But the key difference between Haskell and Java on that front is that only Haskell can do dictionary passing automatically: in both languages, given two instances of Ord T, say b0 and b1, you can build a function f that takes those as arguments and produces the instance for the pair type (b0, b1), using, say, the lexicographic order. Say now that you are given (("hello", 2), ((3, "hi"), 5)). In Java you have to remember the instances for string and int, and pass the correct instance (made of four applications of f!) in order to apply between to that object. Haskell can apply compositionality, and figure out how to build the correct instance given just the ground instances and the f constructor (this extends to other constructors, of course) .
Now, as far as type inference goes (and this should probably be a distinct question), for both languages it is incomplete, in the sense that you can always write an un-annotated program for which the compiler won't be able to determine the type.
for Haskell, this is because it has impredicative (a.k.a. first-class) polymorphism, for which type inference is undecidable. Note that on that point, Java is limited to first-order polymorphism (something on which Scala expands).
for Java, this is because it supports contravariant subtyping.
But those languages mainly differ in the range of program statements to which type inference applies in practice, and in the importance given to the correctness of the type inference results.
For Haskell, inference applies to all "non-highly polymorphic" terms, and make a serious effort to return sound results based on published extensions of a well-known algorithm:
At its core, Haskell's inference is based on Hindley-Milner, which gives you complete results as soon as when infering the type of an application, type variables (e.g. the A and B in the example above) can be only instantiated with non-polymorphic types (I'm simplifying, but this is essentially the ML-style polymorphism you can find in e.g. Ocaml.).
a recent GHC will make sure that a type annotation may be required only for a let-binding or λ-abstraction that has a non-Damas-Milner type.
Haskell has tried to stay relatively close to this inferrable core across even its most hairy extensions (e.g. GADTs). At any rate, proposed extensions nearly always come in a paper with a proof of the correctness of the extended type inference .
For Java, type inference applies in a much more limited fashion anyway :
Prior to the release of Java 5, there was no type inference in Java. According to the Java language culture, the type of every variable, method, and dynamically allocated object must be explicitly declared by the programmer. When generics (classes and methods parameterized by type) were introduced in Java 5, the language retained this requirement for variables, methods, and allocations. But the introduction of polymorphic methods (parameterized by type) dictated that either (i) the programmer provide the method type arguments at every polymorphic method call site or (ii) the language support the inference of method type arguments. To avoid creating an additional clerical burden for programmers, the designers of Java 5 elected to perform type inference to determine the type arguments for polymorphic method calls. (source, emphasis mine)
The inference algorithm is essentially that of GJ, but with a somewhat kludgy addition of wildcards as an afterthought (Note that I am not up to date on the possible corrections made in J2SE 6.0, though). The large conceptual difference in approach is that Java's inference is local, in the sense that the inferred type of an expression depends only on constraints generated from the type system and on the types of its sub-expressions, but not on the context.
Note that the party line regarding the incomplete & sometimes incorrect type inference is relatively laid back. As per the spec:
Note also that type inference does not affect soundness in any way. If the types inferred are nonsensical, the invocation will yield a type error. The type inference algorithm should be viewed as a heuristic, designed to perfdorm well in practice. If it fails to infer the desired result, explicit type paramneters may be used instead.
Parametric polymorphism means, we don't care about the type, we'll implement the function the same for any type. For example, in Haskell:
length :: [a] -> Int
length [] = 0
length (x:xs) = 1 + length xs
We don't care what the type of the elements of the list are, we just care how many there are.
Ad-hoc polymorphism (aka method overloading), however, means that we'll use a different implementation depending on the type of the parameter.
Here's an example in Haskell. Let's say we want to define a function called makeBreakfast.
If the input parameter is Eggs, I want makeBreakfast to return a message on how to make eggs.
If the input parameter is Pancakes, I want makeBreakfast to return a message on how to make pancakes.
We'll create a typeclass called BreakfastFood that implements the makeBreakfast function. The implementation of makeBreakfast will be different depending on the type of the input to makeBreakfast.
class BreakfastFood food where
makeBreakfast :: food -> String
instance BreakfastFood Eggs where
makeBreakfast = "First crack 'em, then fry 'em"
instance BreakfastFood Toast where
makeBreakfast = "Put bread in the toaster until brown"
According to John Mitchell's Concepts in Programming Languages,
The key difference between parametric polymorphism and overloading (aka ad-hoc polymorphism) is that parameteric polymorphic functions use one algorithm to operate on arguments of many different types, whereas overloaded functions may use a different algorithm for each type of argument.
A complete discussion of what parametric polymorphism and ad-hoc polymorphism mean and to what extent they're available in Haskell and in Java is longish; however, your concrete questions can be tackled much more simply:
How algorithm of type inference e.g. in Java difference from the type inference in Haskell?
As far as I know, Java does not do type inference. So the difference is that Haskell does it.
Please, give me an example of the situation where something can be written in Java/Scala but can not be written in Haskell(according to the modular features of these platforms too), and vice-versa.
One very simple example of something Haskell can do that Java can't is to define maxBound :: Bounded a => a. I don't know enough Java to point out something it can do that Haskell can't.
Related
We can create a generic class in Java like this
public class MyClass<T> {
...
but, now that i'm translating a (very large) C++ code to Java, i need a class to be different from other depending on its size, like in this c++ code:
template<size_t size> class MyClass {
...
so every class is a different type, there static members are different, and members like "compare" can only be used with objects with the same size.
Is possible to do this in Java? if not, how would you handle this?
Sure, but it sucks.
You can model "counting' with a chain of recursive types. Inc<Inc<Inc<Integer>> could represent 3.
It is exceedingly awkward.
Java generics are not C++ templates. Java generics have a common base implementation and auto write some wrapping code to cast parameterized arguments to/from a common base in a thin wrapper.
C++ templates generate distinct types.
The design of C++ templates was to replace code generation and/or hand-rolled C code low level data structures. The goal was a template class could match or even exceed hand-written C versions (exceed because you can invest more engineering effort into the single template, and reuse it in 100s of spots).
Templates like std::function more closely approach Java generics. While the implementation is dissimilar, here it converts a myriad of types to one interface, hiding the casting from the end user. In C++ this technique is called type erasure, where std function "erases" all information about the stored callable except what it exposes. C++ type erasure does not require a common base class; Java does.
But because Java generics only supports one kind of type erssure, and C++ templates support not only more kinds of type erasure but also entitely different metaprogramming techniques that are alien to Java, replacing templates with Java generics is going to consistently run into problems. Only when the C++ use case happens to perfectly line up with the weaker Java generics does it work right.
(Note that, while weaker, Java generics make type erasure far easier, because they write a bunch of the casting code for you, and type check it. Weaker doesn't mean worse; it often means safer. But mechanically replacing a system with a weaker one often is doomed to failure.)
No, you can't use values as parameters instead of a generic type in Java. You should probably just take the size as a parameter in the constructor and implement safety checks taking the size into account.
Is there a particular reason behind java wrapper classes (java.lang.Integer, java.lang.Boolean, ...) not having a common supertype ?
I'm asking because it would be quite handy to have (e.g.) WrapperType::getType function along the classic Object::getClass which would return the class of the primitive type.
More specifically, the context is invoking constructors via reflection where you only have the Class<T> and the parameters Object[]
E.g:
public static <T> T createInstance(Class<T> clz, Object... params) throws Exception
In order to get the constructor I can get the parameter types via:
Class<?>[] c = Arrays
.stream(params)
.map(Object::getClass)
.toArray(Class<?>[]::new);
return clz.getConstructor(c).newInstance(params);
but this will of course fail with constructors like String::new(char[], int, int);
If that supertype existed I could do:
.map( o -> o.getClass().isPrimitive() ? ((WrapperType) o).getType() : o.getClass() )
I guess there is a particular reason java developers did not implement it.
Java designers probably aren't too abstract in their decisions and couldn't find many similarities between numeric types (e.g. Integer) and non-numeric types (e.g. Boolean) to group them by a superclass.
Though, all the numeric types extend Number which truly represents "numeric values that are convertible to the primitive types byte, double, float, int,long, short".
Actually, all the wrapper classes have a common interface Comparable<T> (e.g. Comparable<Integer> for Integer).
I can see how practical this would be, but in terms of abstraction this would be a big mess. Let's say you have the wrappers Integer and Double along with the others in that family, they do have java.lang.Number as their super. However there is no semantic relationship between them and a Boolean ( taking in c influences apart).
But for argument sake, let's do consider they are all in a big bucket and are Wrappers, are they still Numbers?
Considering the languages historic aversion to multiple inheritance, having Numbers would make much more scene if you were to pick.
After the big revolutionary change in the language capabilities with SE 8, we are now able to have multiple inheritance through interfaces while using standard implementations for such cases. And if one analyse further, they will within a logical reasoning conclude that those cases call for contracts and not for inheritance; thus, even if done, this would be a job for interfaces.
Taking in account that a solid way to implement this is quite new, and that Wrappers would be a over generalized way to do it, we can see that is better not to have it. Although it is now possible, there are historical and semantic reasons for why it was not presente, at least as a super class. Abstraction is key here. But nothing keeps you from inserting a factory in you pipeline, or implement the mentioned interface.
Buttom line, the key factors are:
Abstraction: this is a more like contract or a is-a relationship
History: generics, and multiple inheritance through interfaces
Imagine the size of the inheritance three if we would accommodate
cases like: it is a wrapper first, then a Number or a Boolean..
Are Java 8 closures really first-class values or are they only a syntactic sugar?
I would say that Java 8 closures ("Lambdas") are neither mere syntactic sugar nor are they first-class values.
I've addressed the issue of syntactic sugar in an answer to another StackExchange question.
As for whether lambdas are "first class" it really depends on your definition, but I'll make a case that lambdas aren't really first class.
In some sense a lambda wants to be a function, but Java 8 is not adding function types. Instead, a lambda expression is converted into an instance of a functional interface. This has allowed lambdas to be added to Java 8 with only minor changes to Java's type system. After conversion, the result is a reference just like that of any other reference type. In fact, using a Lambda -- for example, in a method that was passed a lambda expression as parameter -- is indistinguishable from calling a method through an interface. A method that receives a parameter of a functional interface type can't tell whether it was passed a lambda expression or an instance of some class that happens to implement that functional interface.
For more information about whether lambdas are objects, see the Lambda FAQ Answer to this question.
Given that lambdas are converted into objects, they inherit (literally) all the characteristics of objects. In particular, objects:
have various methods like equals, getClass, hashCode, notify, toString, and wait
have an identity hash code
can be locked by a synchronized block
can be compared using the == and != and instanceof operators
and so forth. In fact, all of these are irrelevant to the intended usage of lambdas. Their behavior is essentially undefined. You can write a program that uses any of these, and you will get some result, but the result may differ from release to release (or even run to run!).
Restating this more concisely, in Java, objects have identity, but values (particularly function values, if they were to exist) should not have any notion of identity. Java 8 does not have function types. Instead, lambda expressions are converted to objects, so they have a lot baggage that's irrelevant to functions, particularly identity. That doesn't seem like "first class" to me.
Update 2013-10-24
I've been thinking further on this topic since having posted my answer several months ago. From a technical standpoint everything I wrote above is correct. The conclusion is probably expressed more precisely as Java 8 lambdas not being pure (as opposed to first-class) values, because they carry a lot of object baggage along. However, just because they're impure doesn't mean they aren't first-class. Consider the Wikipedia definition of first-class function. Briefly, the criteria listed there for considering functions first-class are the abilities to:
pass functions as arguments to other functions
return functions from other functions
assign functions to variables
store functions in data structures
have functions be anonymous
Java 8 lambdas meet all of these criteria. So that does make them seem first-class.
The article also mentions function names not having special status, instead a function's name is simply a variable whose type is a function type. Java 8 lambdas do not meet this last criterion. Java 8 doesn't have function types; it has functional interfaces. These are used effectively like function types, but they aren't function types at all. If you have a reference whose type is a functional interface, you have no idea whether it's a lambda, an instance of an anonymous inner class, or an instance of a concrete class that happens to implement that interface.
In summary, Java 8 lambdas are more first-class functions than I had originally thought. They just aren't pure first-class functions.
Yes, they are first class values (or will be, once Java 8 is released...)
In the sense that you can pass them as arguments, compose them to make higher order functions, store them in data structures etc. You will be able to use them for a broad range of functional programming techniques.
See also for a bit more definition of what "first class" means in this context:
http://en.wikipedia.org/wiki/First-class_citizen
As I see it, it is syntactic sugar, but in addition with the type inference, a new package java.util.functions and semantic of inner classes it does appear as a first-class value.
A real closure with variable binding to the outside context has some overhead. I would consider the implementation of Java 8 optimal, sufficiently pure.
It is not merely syntactical sugar at least.
And I wouldn't know of any more optimal implementation.
For me Lambdas in Java 8 is just syntax sugar because you cannot use it as First class Citizen (http://en.wikipedia.org/wiki/First-class_function) each function should be wrapped into object it imposes many limitation when comparing to language with pure first class function like SCALA. Java 8 closures can only capture immutable ("effectively final") non-local variables.
Here is better explanation why it is syntax-sugar Java Lambdas and Closures
I read from an interview with Neal Gafter:
"For example, adding function types to the programming language is much more difficult with Erasure as part of Generics."
EDIT:
Another place where I've met similar statement was in Brian Goetz's message in Lambda Dev mailing list, where he says that lambdas are easier to handle when they are just anonymous classes with syntactic sugar:
But my objection to function types was not that I don't like function types -- I love function types -- but that function types fought badly with an existing aspect of the Java type system, erasure. Erased function types are the worst of both worlds. So we removed this from the design.
Can anyone explain these statements? Why would I need runtime type information with lambdas?
The way I understand it, is that they decided that thanks to erasure it would be messy to go the way of 'function types', e.g. delegates in C# and they only could use lambda expressions, which is just a simplification of single abstract method class syntax.
Delegates in C#:
public delegate void DoSomethingDelegate(Object param1, Object param2);
...
//now assign some method to the function type variable (delegate)
DoSomethingDelegate f = DoSomething;
f(new Object(), new Object());
(another sample here
http://geekswithblogs.net/joycsharp/archive/2008/02/15/simple-c-delegate-sample.aspx)
One argument they put forward in Project Lambda docs:
Generic types are erased, which would expose additional places where
developers are exposed to erasure. For example, it would not be
possible to overload methods m(T->U) and m(X->Y), which would be
confusing.
section 2 in:
http://cr.openjdk.java.net/~briangoetz/lambda/lambda-state-3.html
(The final lambda expressions syntax will be a bit different from the above document:
http://mail.openjdk.java.net/pipermail/lambda-dev/2011-September/003936.html)
(x, y) => { System.out.printf("%d + %d = %d%n", x, y, x+y); }
All in all, my best understanding is that only a part of syntax stuff that could, actually will be used.
What Neal Gafter most likely meant was that not being able to use delegates will make standard APIs more difficult to adjust to functional style, rather than that javac/JVM update would be more difficult to be done.
If someone understands this better than me, I will be happy to read his account.
Goetz expands on the reasoning in State of the Lambda 4th ed.:
An alternative (or complementary) approach to function types,
suggested by some early proposals, would have been to introduce a new,
structural function type. A type like "function from a String and an
Object to an int" might be expressed as (String,Object)->int. This
idea was considered and rejected, at least for now, due to several
disadvantages:
It would add complexity to the type system and further mix structural and nominal types.
It would lead to a divergence of library styles—some libraries would continue to use callback interfaces, while others would use structural
function types.
The syntax could be unweildy, especially when checked exceptions were included.
It is unlikely that there would be a runtime representation for each distinct function type, meaning developers would be further exposed to
and limited by erasure. For example, it would not be possible (perhaps
surprisingly) to overload methods m(T->U) and m(X->Y).
So, we have instead chosen to take the path of "use what you
know"—since existing libraries use functional interfaces extensively,
we codify and leverage this pattern.
To illustrate, here are some of the functional interfaces in Java SE 7
that are well-suited for being used with the new language features;
the examples that follow illustrate the use of a few of them.
java.lang.Runnable
java.util.concurrent.Callable
java.util.Comparator
java.beans.PropertyChangeListener
java.awt.event.ActionListener
javax.swing.event.ChangeListener
...
Note that erasure is just one of the considerations. In general, the Java lambda approach goes in a different direction from Scala, not just on the typed question. It's very Java-centric.
Maybe because what you'd really want would be a type Function<R, P...>, which is parameterised with a return type and some sequence of parameter types. But because of erasure, you can't have a construct like P..., because it could only turn into Object[], which is too loose to be much use at runtime.
This is pure speculation. I am not a type theorist; i haven't even played one on TV.
I think what he means in that statement is that at runtime Java cannot tell the difference between these two function definitions:
void doIt(List<String> strings) {...}
void doIt(List<Integer> ints) {...}
Because at compile time, the information about what type of data the List contains is erased, so the runtime environment wouldn't be able to determine which function you wanted to call.
Trying to compile both of these methods in the same class will throw the following exception:
doIt(List<String>) clashes with doIt(List<Integer); both methods have the same erasure
This question is inspired from Joel's "Making Wrong Code Look Wrong"
http://www.joelonsoftware.com/articles/Wrong.html
Sometimes you can use types to enforce semantics on objects beyond their interfaces. For example, the Java interface Serializable does not actually define methods, but the fact that an object implements Serializable says something about how it should be used.
Can we have UnsafeString and SafeString interfaces/subclasses in, say Java, that are used in much of the same way as Joel's Hungarian notation and Java's Serializable so that it doesn't just look bad--it doesn't compile?
Is this feasible in Java/C/C++ or are the type systems too weak or too dynamic?
Also, beyond input sanitization, what other security functions can be implemented in this manner?
The type system already enforces a huge number of such safety features. That is essentially what it's for.
For a very simple example, it prevents you from treating a float as an int. That's one aspect of safety -- it guarantees that the type you're working on are going to behave as expected. It guarantees that only string methods are called on a string. Assembly doesn't have that safeguard, for example.
It's also the job of the type system to ensure that you don't call private functions on a class. That's another safety feature.
Java's type system is too anemic to enforce a lot of interesting constraints effectively, but in many other languages (including C++), the type system can be used to enforce far more wide-ranging rules.
In C++, template metaprogramming gives you a lot of tools for prohibiting "bad" code. For example:
class myclass : boost::noncopyable {
...
};
enforces at compile-time that the class can not be copied. The following will produce compile errors:
myclass m;
myclass m2(m); // copy construction isn't allowed
myclass m3;
m3 = m; // assignment also not allowed
Likewise, we can ensure at compile-time that a template function only gets called on types which fulfill certain criteria (say, they must be random-access iterators, while bilinear ones aren't allowed, or they must be POD types, or they must not be any kind of integer type (char, short, int, long), but all other types should be legal.
A textbook example of template metaprogramming in C++ implements a library for computing physical units. It allows you to multiply a value of type "meter" with another value of the same type, and automatically determines that the result must be of type "square meter". Or divide a value of type "mile" with a value of type "hour" and get a unit of type "miles per hour".
Again, a safety feature that prevents you from getting your types mixed up and accidentally getting your units mixed up. You'll get a compile error if you compute a value and try to assign it to the wrong type. trying to divide, say, liters by meters^2 and assigning the result to a value of, say, kilograms, will result in a compile error.
Most of this requires some manual work to set up, certainly, but the language gives you the tools you need to basically build the type-checks you want. Some of this could be better supported directly in the language, but the more creative checks would have to be implemented manually in any case.
Yes you can do such thing. I don't know about Java, but in C++ it isn't customary and there is no support for this, so you have to do some manual work. It is customary in some other languages, Ada for example, which have the equivalent of a typedef which introduces a new type which can't be converted implicitly into the orignal one (this new type "inherits" some basic operations from the one it is created, so it stays usefull).
BTW, in general inheritance isn't a good way to introduce the new types, as even if there is no implicit conversion in one way, there is one in the other one.
You can do a certian amount of this out of the box in Ada. For example, you can make integer types that cannot implcitily interoperate with each other, and Ada enumerations are not compatible with any integer type. You can still convert between them, but you have to explicitly do it, which calls attention to what you are doing.
You could do the same with present-day C++, but you'd have to wrap all your integers and enums in classes, which is just way too much work for something that should be simple (or better yet, the default way of doing things).
I understand the next version of C++ is going to fix at least the enumeration issue.
In C++, I suppose you could use typedef to create a synonym for a primitive type. Your synonym could imply something about the content of that variable, replacing the function of the apps hungarian notation.
Intellisense will report the synonym you used during declaration, so if you don't like using actual hungarian, it does save you from scrolling about (or using Go To Definition).
I guess you are thinking of something along the lines of Perl's "tainting" analysis.
In Java, it should be possible to use custom annotations and an annotation processor to implement this. Not necessarily easy though.
You can't have a UnsafeString subclass of String in Java, since java.lang.String is final.
In general, you cannot provide any kind of security on the source level - if you want to protect against evil code, you must do that on the binary level (e.g. Java bytecode). That's why private/protected can't be used as a security mechanism in C++: it is possible to bypass that with pointer manipulations.