rJava generics type - java

I have been playing with rJava package, but since it seems that rJava is not aware of Java generic types, I have difficulties creating java object with generic type parameters. If I have a java class like:
public class A<T> {
private B<T> b;
public A(B<T> b) {
this.b = b;
}
}
I would like to create an A object from R session using .jnew() by passing a B object already created (with instantiated type parameter), but rJava always gives error:
java.lang.NoSuchMethodError: <init>
Is there any work around for this?

There are a lot of moving parts in this question. Digging through the documentation for the various parts, I think that you need to do this on the line that broke:
gesinstance = .jnew("edu/cmu/tetrad/search/Ges", .jcast(dataset, "edu/cmu/tetrad/data/DataSet"))
The key difference being the call to .jcast on the second argument. (I don't have R installed, so I could not test this - If it doesn't work, I will update my answer based on any feedback you can provide on new error messages.)
So then the question is "why that?" The answer seems to be:
On the Java side, DataReader.parseTabularData returns an object with type DataSet as you noted, but DataSet is an interface not a class. That necessarily means that the actual object returned is of some class that implements the DataSet interface.
For reasons that aren't immediately clear to me, the rJava package does not really handle polymorphism well. It requires that you call methods with an "exact" signature match to the objects that you are passing. In this case, you will need to "up-cast" from whatever specific class you got to the interface DataSet. See the documentation for .jnew (https://www.rforge.net/doc/packages/rJava/html/jnew.html), especially for the arguments that they denote by "...". This refers you to the corresponding part of the documentation for .jcall (https://www.rforge.net/doc/packages/rJava/html/jcall.html), when then explains the requirement to call .jcast (https://www.rforge.net/doc/packages/rJava/html/jcast.html) with some examples.
The error that you got java.lang.NoSuchMethodError: <init> was telling you that the JVM could not find the constructor that you called. This was mysterious looking in the example that you posted in the comments. (It might be good to edit your question, by the way, and include that information up there for posterity.) The code certainly looks right, and, knowing Java, I intuitively expected the interface to respect the polymorphism of the Java. Given that (for whatever reason), the interface to R does "exact" type matching without considering inheritance, it's clear that it will not find a constructor due to reason #1 above.
Finally, I didn't actually encounter any Java classes using generics in my limited exploration of Tetrad. As it turns out, that was a complete red herring though. Should it be an issue in the future, you'll probably want to check out "Type Erasure" (https://docs.oracle.com/javase/tutorial/java/generics/erasure.html). If you were interfacing between Java and C, C++, Fortran, any language that Java considers "native," then you'd deal with the generics in the native code by dealing in the type-erased forms. The rJava interface may be different though, since this seems to fall into the same general type of structure that tripped you up on your current problem. (Maybe worthy of its own bounty later!)

Related

Why does Java's Dynamic Proxies need reflection?

Java's Dynamic Proxy Docs describe these constructors as the following:
A dynamic proxy class is a class that implements a list of interfaces specified at runtime such that a method invocation through one of the interfaces on an instance of the class will be encoded and dispatched to another object through a uniform interface. Thus, a dynamic proxy class can be used to create a type-safe proxy object for a list of interfaces without requiring pre-generation of the proxy class.
Now, while everything in this sentence is accurate. All of this information is in fact present at compile time. In Java, when you are creating a proxy, you specify the exact interface you want to proxy in your code.
Now the first thing that really confuses me, is why the byte code here needs to be generated at run time? All of this information is present at the compile time... (and you don't even have to deal with type erasure)
P.S: I am not sure if this is still the case, basing this on a quite dated accepted answer here: How does Java's Dynamic Proxy actually work?)
The next step in getting this work is the type checking/type inference. I am not sure how Java actually handles this, but you need to be able to use Proxy<A> interchangeably with A. In order to pull this of you need the following:
∀ method m ∈ A, m ∈ Proxy<A>
Which means that
you need your proxy to have the same structure
you need some kind of delegation (i.e. dynamic dispatch).
Once you start writing out the inference rules, this gives us something very familiar. Structural Typing
Now Java doesn't have structural typing but one could easily add the few inference rules (i.e. Typescript) especially given Java has boxed primitives (Scala for example doesn't which makes it very difficult to introduce structural inference).
The Actual Question
Reflection is hard, not safe and not super performant. My question is why and how Java's proxies use reflection? It seems like most of this functionality could be implemented using other features already present in the language.

Why did Java have to adopt type erasure to maintain backwards compatability?

Let me preface this question by saying up front that I understand what Java can and can't do and am not asking about that. I'm wondering what the actual technical challenges are, from JVM and compiler standpoint, that require the compiler to behave the way it does.
Whenever I see discussions on weaknesses or most hated aspects of java Type Erasure always seems to be somewhere near the top of the list for Java Developers (it is for me!). If my history is correct Java 1.0 never implementing any type checking beyond passing Objects and recasting them. When a better Type system was required Sun had to decide between full Typing support which would break backwards comparability or going with their chosen solution of generics which didn't break old code.
Meanwhile C# ran into the same issue and went the opposite route of breaking backwards comparability to implement a more complex typing system around the same time (I believe).
My main question is why was this a either-or question for the two languages? What is it about the compiler process that means there is no way to support C# style handling of type without breaking backwards comparability in old code? I understand part of the problem is that the exact type is not always known at compile time, but at first (naive) glance it seems like some times it can be known at compile time, or that it can be left unknown at compile time and handled with a sort of reflection approach at runtime.
Is the problem that it's not feasible to implement, or that it was simply deemed too slow to implement a runtime sort of solution?
To go a step further lets use a simple generic factory example of code as an example of a place where type erasure feels rather cumbersome.
public class GenericFactory<FinalType, BuilderType<FinalType> extends GenericBuilder<FinalType>>{
private Class builderClass;
public GenericFactory(Class<BuilderType> builderClass){
this.builderClass=builderClass;
}
public FinalType create(){
GenericBuilder builder=builderClass.newInstance();
builder.setFoo(getSystemProperty("foo");
builder.setBar(getSystemProperty("bar");
builder.setBaz(getSystemProperty("baz");
return builder.build();
}
}
This example, assuming I didn't screw up on syntax somewhere, shows two particular annoyances of type erasure that at first glance seem like they should be easier to handle.
First, and less relevant, I had to add a FinalType parameter before I could refer to BuilderType extends GenericBuilder, even though it seems like FinalType could be inferred from BuilderType. I say less relevant since this may be more about generics syntax/implementation then the compiler limits that forced type erasure.
The second issue is that I had to pass in my BuilderClass object to the constructor in order to use reflection to build the builder, despite it being defined by the generics already. It seems as if it would be relatively easy for the compiler to store the generic class used here (so long as it didn't use the ? syntax) to allow reflection to look up the generic and then construct it.
Since this isn't done I presume there is a very good reason it is not. I'm trying to understand what these reasons are, what forces the JVM to stick with type erasure to maintain backwards compatibility?
I'm not sure what you're describing (the two "annoyances") are a result of type erasure.
I had to add a FinalType parameter before I could refer to BuilderType extends GenericBuilder, even though it seems like FinalType could be inferred from BuilderType
BuilderType<FinalType> would not be a valid generic type name unless I missed some changes to that in Java 8. Thus it should be BuilderType extends GenericBuilder<FinalType> which is fine. FinalType can't be inferred here, how should the compiler know which type to provide?
The second issue is that I had to pass in my BuilderClass object to the constructor in order to use reflection to build the builder, despite it being defined by the generics already.
That's not true. The generic parameters don't define what FinalType actually is. I could create a GenericFactory<String, StringBuilderType> (with StringBuilderType extends GenericBuilder<String>) as well as a GenericFactory<Integer, IntegerBuilderType> (with IntegerBuilderType extends GenericBuilder<Integer>).
Here, if you'd provide the type parameters to a variable definition or method call, type erasure would happen. As for the why refer to Andy's comment.
However, if you'd have a field or subclass, e.g. private GenericFactory<String, StringBuilderType> stringFactory, there is no type erasure. The generic types can be extracted from the reflection data (unfortunately there's no easy built-in way, but have a look here: http://www.artima.com/weblogs/viewpost.jsp?thread=208860).

Why does erasure complicate implementing function types?

I read from an interview with Neal Gafter:
"For example, adding function types to the programming language is much more difficult with Erasure as part of Generics."
EDIT:
Another place where I've met similar statement was in Brian Goetz's message in Lambda Dev mailing list, where he says that lambdas are easier to handle when they are just anonymous classes with syntactic sugar:
But my objection to function types was not that I don't like function types -- I love function types -- but that function types fought badly with an existing aspect of the Java type system, erasure. Erased function types are the worst of both worlds. So we removed this from the design.
Can anyone explain these statements? Why would I need runtime type information with lambdas?
The way I understand it, is that they decided that thanks to erasure it would be messy to go the way of 'function types', e.g. delegates in C# and they only could use lambda expressions, which is just a simplification of single abstract method class syntax.
Delegates in C#:
public delegate void DoSomethingDelegate(Object param1, Object param2);
...
//now assign some method to the function type variable (delegate)
DoSomethingDelegate f = DoSomething;
f(new Object(), new Object());
(another sample here
http://geekswithblogs.net/joycsharp/archive/2008/02/15/simple-c-delegate-sample.aspx)
One argument they put forward in Project Lambda docs:
Generic types are erased, which would expose additional places where
developers are exposed to erasure. For example, it would not be
possible to overload methods m(T->U) and m(X->Y), which would be
confusing.
section 2 in:
http://cr.openjdk.java.net/~briangoetz/lambda/lambda-state-3.html
(The final lambda expressions syntax will be a bit different from the above document:
http://mail.openjdk.java.net/pipermail/lambda-dev/2011-September/003936.html)
(x, y) => { System.out.printf("%d + %d = %d%n", x, y, x+y); }
All in all, my best understanding is that only a part of syntax stuff that could, actually will be used.
What Neal Gafter most likely meant was that not being able to use delegates will make standard APIs more difficult to adjust to functional style, rather than that javac/JVM update would be more difficult to be done.
If someone understands this better than me, I will be happy to read his account.
Goetz expands on the reasoning in State of the Lambda 4th ed.:
An alternative (or complementary) approach to function types,
suggested by some early proposals, would have been to introduce a new,
structural function type. A type like "function from a String and an
Object to an int" might be expressed as (String,Object)->int. This
idea was considered and rejected, at least for now, due to several
disadvantages:
It would add complexity to the type system and further mix structural and nominal types.
It would lead to a divergence of library styles—some libraries would continue to use callback interfaces, while others would use structural
function types.
The syntax could be unweildy, especially when checked exceptions were included.
It is unlikely that there would be a runtime representation for each distinct function type, meaning developers would be further exposed to
and limited by erasure. For example, it would not be possible (perhaps
surprisingly) to overload methods m(T->U) and m(X->Y).
So, we have instead chosen to take the path of "use what you
know"—since existing libraries use functional interfaces extensively,
we codify and leverage this pattern.
To illustrate, here are some of the functional interfaces in Java SE 7
that are well-suited for being used with the new language features;
the examples that follow illustrate the use of a few of them.
java.lang.Runnable
java.util.concurrent.Callable
java.util.Comparator
java.beans.PropertyChangeListener
java.awt.event.ActionListener
javax.swing.event.ChangeListener
...
Note that erasure is just one of the considerations. In general, the Java lambda approach goes in a different direction from Scala, not just on the typed question. It's very Java-centric.
Maybe because what you'd really want would be a type Function<R, P...>, which is parameterised with a return type and some sequence of parameter types. But because of erasure, you can't have a construct like P..., because it could only turn into Object[], which is too loose to be much use at runtime.
This is pure speculation. I am not a type theorist; i haven't even played one on TV.
I think what he means in that statement is that at runtime Java cannot tell the difference between these two function definitions:
void doIt(List<String> strings) {...}
void doIt(List<Integer> ints) {...}
Because at compile time, the information about what type of data the List contains is erased, so the runtime environment wouldn't be able to determine which function you wanted to call.
Trying to compile both of these methods in the same class will throw the following exception:
doIt(List<String>) clashes with doIt(List<Integer); both methods have the same erasure

why MyClass.class exists in java and MyField.field isn't?

Let's say I have:
class A {
Integer b;
void c() {}
}
Why does Java have this syntax: A.class, and doesn't have a syntax like this: b.field, c.method?
Is there any use that is so common for class literals?
The A.class syntax looks like a field access, but in fact it is a result of a special syntax rule in a context where normal field access is simply not allowed; i.e. where A is a class name.
Here is what the grammar in the JLS says:
Primary:
ParExpression
NonWildcardTypeArguments (
ExplicitGenericInvocationSuffix | this Arguments)
this [Arguments]
super SuperSuffix
Literal
new Creator
Identifier { . Identifier }[ IdentifierSuffix]
BasicType {[]} .class
void.class
Note that there is no equivalent syntax for field or method.
(Aside: The grammar allows b.field, but the JLS states that b.field means the contents of a field named "field" ... and it is a compilation error if no such field exists. Ditto for c.method, with the addition that a field c must exist. So neither of these constructs mean what you want them to mean ... )
Why does this limitation exist? Well, I guess because the Java language designers did not see the need to clutter up the language syntax / semantics to support convenient access to the Field and Method objects. (See * below for some of the problems of changing Java to allow what you want.)
Java reflection is not designed to be easy to use. In Java, it is best practice use static typing where possible. It is more efficient, and less fragile. Limit your use of reflection to the few cases where static typing simply won't work.
This may irk you if you are used to programming to a language where everything is dynamic. But you are better off not fighting it.
Is there any use that is so common for class literals?
I guess, the main reason they supported this for classes is that it avoids programs calling Class.forName("some horrible string") each time you need to do something reflectively. You could call it a compromise / small concession to usability for reflection.
I guess the other reason is that the <type>.class syntax didn't break anything, because class was already a keyword. (IIRC, the syntax was added in Java 1.1.)
* If the language designers tried to retrofit support for this kind of thing there would be all sorts of problems:
The changes would introduce ambiguities into the language, making compilation and other parser-dependent tasks harder.
The changes would undoubtedly break existing code, whether or not method and field were turned into keywords.
You cannot treat b.field as an implicit object attribute, because it doesn't apply to objects. Rather b.field would need to apply to field / attribute identifiers. But unless we make field a reserved word, we have the anomalous situation that you can create a field called field but you cannot refer to it in Java sourcecode.
For c.method, there is the problem that there can be multiple visible methods called c. A second issue that if there is a field called c and a method called c, then c.method could be a reference to an field called method on the object referred to by the c field.
I take it you want this info for logging and such. It is most unfortunate that such information is not available although the compiler has full access to such information.
One with a little creativity you can get the information using reflection. I can't provide any examples for asthere are little requirements to follow and I'm not in the mood to completely waste my time :)
I'm not sure if I fully understand your question. You are being unclear in what you mean by A.class syntax. You can use the reflections API to get the class from a given object by:
A a = new A()
Class c = a.getClass()
or
Class c = A.class;
Then do some things using c.
The reflections API is mostly used for debugging tools, since Java has support for polymorphism, you can always know the actual Class of an object at runtime, so the reflections API was developed to help debug problems (sub-class given, when super-class behavior is expected, etc.).
The reason there is no b.field or c.method, is because they have no meaning and no functional purpose in Java. You cannot create a reference to a method, and a field cannot change its type at runtime, these things are set at compile-time. Java is a very rigid language, without much in the way of runtime-flexibility (unless you use dynamic class loading, but even then you need some information on the loaded objects). If you have come from a flexible language like Ruby or Javascript, then you might find Java a little controlling for your tastes.
However, having the compiler help you figure our potential problems in your code is very helpful.
In java, Not everything is an object.
You can have
A a = new A()
Class cls = a.getClass()
or directly from the class
A.class
With this you get the object for the class.
With reflection you can get methods and fields but this gets complicated. Since not everything is an object. This is not a language like Scala or Ruby where everything is an object.
Reflection tutorial : http://download.oracle.com/javase/tutorial/reflect/index.html
BTW: You did not specify the public/private/protected , so by default your things are declared package private. This is package level protected access http://download.oracle.com/javase/tutorial/java/javaOO/accesscontrol.html

How do I argue against Duck-typing in a strongly typed language like Java?

I work on a team of Java programmers. One of my co-workers suggests from time-to-time that I do something like "just add a type field" (usu. "String type"). Or code will be committed laden with "if (foo instanceof Foo){...} else if( foo instanceof Bar){...}".
Josh Bloch's admonition that "tagged classes are a wan imitation of a proper class hierarchy" notwithstanding, what is my one-line response to this sort of thing? And then how do I elaborate the concept more seriously?
It's clear to me that - the context being Java - the type of Object under consideration is right in front of our collective faces - IOW: The word right after the "class", "enum" or "interface", etc.
But aside from the difficult-to-demonstrate or quantify (on the spot) "it makes your code more complicated", how do I say that "duck-typing in a (more or less) strongly-typed language is a stupid idea that suggests a much deeper design pathology?
Actually, you said it reasonably well right there.
The truth is that the "instance of" comb is almost always a bad idea (the exception happening for example when you're marshaling or serializing, when for a short interval you may not have all the type information at hand.) As josh says, that's a sign of a bad class hierarchy otherwise.
The way that you know it's a bad idea is that it makes the code brittle: if you use that, and the type hierarchy changes, then it probably breaks that instance-of comb everywhere it occurs. What's more, you then lose the benefit of strong typing; the compiler can't help you by catching errors ahead of time. (This is somewhat analogous to the problems caused by typecasts in C.)
Update
Let me extend this a bit, since from a comment it appears I wasn't quite clear. The reason you use a typecast in C, or instanceof, it that you want to say "as if": use this foo as if it were a bar. Now, in C, there is no run time type information around at all, so you're just working without a net: if you typecast something, the generated code is going to treat that address as if it contained a particular type no matter what, and you should only hope that it will cause a run-time error instead of silently corrupting something.
Duck typing just raises that to a norm; in a dynamic, weakly typed language like Ruby or Python or Smalltalk, everything is an untyped reference; you shoot messages at it at runtime and see what happens. If it understands a particular message, it "walks like a duck" -- it handles it.
This can be very handy and useful, because it allows marvelous hacks like assigning a generator expression to a variable in Python, or a block to a variable in Smalltalk. But it does mean you're vulnerable to errors at runtime that a strongly typed language can catch at compile time.
In a strongly-typed language like Java, you can't really, strictly, have duck typing at all: you must tell the compiler what type you're going to treat something as. You can get something like duck typing by using type casts, so that you can do something like
Object x; // A reference to an Object, analogous to a void * in C
// Some code that assigns something to x
((FoodDispenser)x).dropPellet(); // [1]
// Some more code
((MissleController)x).launchAt("Moon"); // [2]
Now at run time, you're fine as long as x is a kind of FoodDispenser at [1] or MissleController at [2]; otherwise boom. Or unexpectedly, no boom.
In your description, you protect yourself by using a comb of else if and instanceof
Object x ;
// code code code
if(x instanceof FoodDispenser)
((FoodDispenser)x).dropPellet();
else if (x instanceof MissleController )
((MissleController)x).launchAt("Moon");
else if ( /* something else...*/ ) // ...
else // error
Now, you're protected against the run-time error, but you've got the responsibility of doing something sensible later, at the else.
But now imagine you make a change to the code, so that 'x' can take the types 'FloorWax' and 'DessertTopping'. You now must go through all the code and find all the instances of that comb and modify them. Now the code is "brittle" -- changes in the requirements mean lots of code changes. In OO, you're striving to make the code less brittle.
The OO solution is to use polymorphism instead, which you can think of as a kind of limited duck typing: you're defining all the operations that something can be trusted to perform. You do this by defining a superior class, probably abstract, that has all the methods of the inferior classes. In Java, a class like that is best expressed an "interface", but it has all the type properties of a class. In fact, you can see an interface as being a promise that a particular class can be trusted to act "as if" it were another class.
public interface VeebleFeetzer { /* ... */ };
public class FoodDispenser implements VeebleFeetzer { /* ... */ }
public class MissleController implements VeebleFeetzer { /* ... */ }
public class FloorWax implements VeebleFeetzer { /* ... */ }
public class DessertTopping implements VeebleFeetzer { /* ... */ }
All you have to do now is use a reference to a VeebleFeetzer, and the compiler figures it out for you. If you happen to add another class that's a subtype of VeebleFeetzer, the compiler will select the method and check the arguments in the bargain
VeebleFeetzer x; // A reference to anything
// that implements VeebleFeetzer
// Some code that assigns something to x
x.dropPellet();
// Some more code
x.launchAt("Moon");
This isn't so much duck typing as it is just proper object-oriented style; indeed, being able to subclass class A and call the same method on class B and have it do something else is the entire point of inheritance in languages.
If you're constantly checking the type of an object, then you're either being too clever (though I suppose it's this cleverness that duck typing aficionados enjoy, except in a less brittle form) or you're not embracing the basics of object-oriented programming.
hmmm...
correct me if I am wrong but tagged classes and duck-typing are two different concepts though not necessarely mutally exclusive.
When one has the urge of using tags in a class to define the type then one should, IMHO, revise their class hiearchy as it is a clear sing of conceptual bleed where an abstract class needs to know the the implementation details that the class parenthood tries to hide. Are you using the correct pattern ? In other words are you trying to coerce behaviour in a pattern that does not naturally support it ?
Where as duck-typing is the ability to loosely define a type where a method can accept any types just so long as the necessary methods in the parameter instance are defined. The method will then use the parameter and call the necessary methods without too much bother on the parenthood of the instance.
So here... the smelly hint is, as Charlie pointed out, the use of instanceof. Much like static or other smelly keywords, whenever they appear one must ask "Am I doing the right thing here ?", not that they are inhertitly wrong but they are oftenly used to hack through a bad or ill fitted OO desing.
My one line response would be that you lose one of the main benefits of OOP: polymorphism. This reduces the time to develop new code (developers love to develop new code, so that should help your argument :-)
If, when adding a new type to an existing system, you have to add logic, aside from figuring out which instance to construct, then, in Java, you are doing something wrong (assuming that the new class should simply be a drop in replacement for another).
Generally, the appropriate way to handle this in Java is to keep the code polymorphic and make use of interfaces. So anytime they find themselves wanting to add another variable or do an instanceof they should probably be implementing an interface instead.
If you can convince them to change the code it is pretty easy to retrofit interfaces into the existing code base. For that matter, I'd take the time to take a piece of code with instanceof and refactor it to be polymorphic. It is much easier for people to see the point if they can see the before and after versions and compare them.
You might want to point your co-worker to the Liskov substitution principle, one of the five pillars in SOLID.
Links:
Wikipedia entry
Article written by Uncle Bob
When you say "duck typing in strongly-typed languages" you actually mean "imitating (subtype) polymorphism in statically-typed languages".
It's not that bad when you have data objects (DTOs) that don't contain any behaviour. When you do have a full-blown OO model (ask yourself if this is really the case) then you should use the polymorphism offered by the language where appropriate.
Although I'm generally a fan of duck-typed languages like python, I can see your problem with it in java.
If you are writing all the classes that will ever be used with this code, then you don't need to duck-type, because you don't need to allow for cases where code can't directly inherit from (or implement) an interface or other unifying abstraction.
A downside of duck-typing is that you have an extra class of unit tests to run on your code: a new class could return a different type than expected, and subsequently cause the rest of the code to fail. So although duck-typing allows backward-flexibility, it requires a lot of forward thinking for tests.
In short you have a catch-all (hard) instead of a catch-few (easy). I think that's the pathology.
Why "imitate a class hierarchy" instead of designing and using it? One of the refactoring methods is replacing "switch"es (chained ifs are almost the same) with polymorphism. Why use switches where polymorphism would lead to cleaner code?
This isn't duck typing, it is just a bad way to simulate polymorphism in a language that has (more or less) real polymorphism.
Two arguments to answer the titled question:
1) Java is supposed to be "write once, run anywhere," so code that was written for one hierarchy shouldn't throw RuntimeExceptions when we change the environment somewhere. (Of course, there are exceptions -- pun -- to this rule.)
2) The Java JIT performs very aggressive optimizations that rely on knowing that a given symbol must be of one type and one type only. The only way to work around this is to cast.
As others have mentioned, your "instance of" doesn't match with the question I've answered here. Anything with any types, duck or static, may have the issue you described. There are better OOP ways to deal with it.
Instead of instanceof you can use the Method- and the Strategy-Pattern, mixed together the code looks much better than before...

Categories

Resources