What exactly is the point of the codebase in Java RMI?

What exactly is the point of the codebase in Java RMI? - java

Im currently learning about RMI.
I dont really understand the concept of the codebase. Every paper i read suggests, that the client, which calls the Remote object can load the Method definitions from the codebase.
The Problem is now: Dont I need the descriptions/interfaces in my classpath anyway? How can i call methods on the remote object, if i only know them during Runtime? This Wouldnt even compile.
Am i completely missing the point here? What exactly is the point of the codebase then? It seems like a lot of extra work and requirements to provide a codebase
thanks

Well, let's say you provide to your client only interfaces, and the implementations will be located in a given code base. Then the client requests the server to send a given object, the client expects to receive an object that implements a given interface, but the actual implementation is unknown to the client, when it deserializes the sent object is when it has to go to the code base and download the corresponding implementing class for the actual object being passed.
This will make the client very thin, and you will very easily update your classes in the code base without having to resort to updating every single client.
EDIT
Let's say you have a RMI server with the following interface
public interface MiddleEarth {
public List<Creature> getAllCreatures();
}
The client will only have the interfaces for MiddleEarth and Creature, but none of the implementations in the class path.
Where the implementations of Creature are serializable objects of type Elf, Man, Dwarf and Hobbit. And these implementations are located in your code base, but not in your client's class path.
When you ask your RMI server to send you the list of all creatures in Middle Earth, it will send objects that implement Creature, that is, any of the classes listed above.
When the client receives the serialized objects it has to look for the class files in order to deserialized them, but these are not located in the local class path. Every object in this stream comes tagged with the given code base that can be used to look for missing classes. Therefore, the client resort to the code base to look for these classes. There it will find the actual creature classes being used.
The code base works in both directions, so it means that if you send your server a Creature (i.e. an Ent) it will look for it in the code base as well.
This means that when both, client and server need to publish new types of creatures all they have to do is to update the creaturesImpl.jar in the code base, and nothing in the server or client applications themselves.

Related

How does Apache Spark send functions to other machines under the hood

I started playing with Pyspark to do some data processing. It was interesting to me that I could do something like
rdd.map(lambda x : (x['somekey'], 1)).reduceByKey(lambda x,y: x+y).count()
And it would send the logic in these functions over potentially numerous machines to execute in parallel.
Now, coming from a Java background, if I wanted to send an object containing some methods to another machine, that machine would need to know the class definition of the object im streaming over the network. Recently java had the idea of Functional Interfaces, which would create an implementation of that interface for me at compile time (ie. MyInterface impl = ()->System.out.println("Stuff");)
Where MyInterface would just have one method, 'doStuff()'
However, if I wanted to send such a function over the wire, the destination machine would need to know the implementation (impl itself) in order to call its 'doStuff()' method.
My question boils down to... How does Spark, written in Scala, actually send functionality to other machines? I have a couple hunches:
The driver streams class definitions to other machines, and those machines dynamically load them with a class loader. Then the driver streams the objects and the machines know what they are, and can execute on them.
Spark has a set of methods defined on all machines (core libraries) which are all that are needed for anything I could pass it. That is, my passed function is converted into one or more function calls on the core library. (Seems unlikely since the lambda can be just about anything, including instantiating other objects inside)
Thanks!
Edit: Spark is written in Scala, but I was interested in hearing how this might be approached in Java (Where a function can not exist unless its in a class, thus changing the class definition which needs updated on worker nodes).
Edit 2:
This is the problem in java in case of confusion:
public class Playground
{
private static interface DoesThings
{
public void doThing();
}
public void func() throws Exception {
Socket s = new Socket("addr", 1234);
ObjectOutputStream oos = new ObjectOutputStream(s.getOutputStream());
oos.writeObject("Hello!"); // Works just fine, you're just sending a string
oos.writeObject((DoesThings)()->System.out.println("Hey, im doing a thing!!")); // Sends the object, but error on other machine
DoesThings dt = (DoesThings)()->System.out.println("Hey, im doing a thing!!");
System.out.println(dt.getClass());
}
}
The System.out,println(dt.getClass()) returns:
"class JohnLibs.Playground$$Lambda$1/23237446"
Now, assume that the Interface definition wasn't in the same file, it was in a shared file both machines had. But this driver program, func(), essentially creates a new type of class which implements DoesThings.
As you can see, the destination machine is not going to know what JohnLibs.Playground$$Lambda$1/23237446 is, even though it knows what DoesThings is. It all comes down to you cant pass a function without it being bound to a class. In python you could just send a String with the definition, and then execute that string (Since its interpreted). Perhaps thats what spark does, since it uses scala instead of java (If scala can have functions outside of classes)

Java bytecode, which is, of course, what both Java and Scala are compiled to, was created specifically to be platform independent. So, if you have a classfile you can move it to any other machine, regardless of "silicon" architecture, and provided it has a JVM of at least that verion, it will run. James Gosling and his team did this deliberately to allow code to move between machines right from the very start, and it was easy to demonstrate in Java 0.98 (the first version I played with).
When the JVM tries to load a class, it uses an instance of a ClassLoader. Classloaders encompass two things, the ability to fetch the binary of a bytecode file, and the ability to load the code (verify its integrity, convert it into an in-memory instance of java.lang.Class, and make it available to other code in the system). At Java 1, you mostly had to write your own classloader if you wanted to take control of how the byes were loaded, although there was a sun-specific AppletClassLoader, which was written to load classfiles from http, rather than from the file system.
A little later, at Java 1.2, the "how to fetch the bytes of the classfile" part was separated out in the URLClassloader. That could use any supported protocol to load classes. Indeed, the protocol support mechanism was and is extensible via pluggable protocol handlers. So, now you can load classes from anywhere without the risk of making mistakes in the harder part, which is how you verify and install the class.
Along with that, Java's RMI mechanism allows a serialized object (the class name, along with the "state" part of an object) to be wrapped in a MarshaledObject. This adds "where this class may be loaded from", represented as a URL. RMI automates the conversion of real objects in memory to MarshaledObjects and also shipping them around on the network. If a JVM receives a marshaled object for which it already has the class definition, it always uses that class definition (for security). If not, however, then provided a bunch of criteria are met (security, and just plain working correctly, criteria) then the classfile may be loaded from that remote server, allowing a JVM to load classes for which it has never seen the definitions. (Obviously, the code for such systems must typically be written against ubiquitous interfaces--if not, there's going to be a lot of reflection going on!)
Now, I don't know (indeed, I found your question trying to determine the same thing whether Spark uses RMI infrastructure (I do know that hadoop does not, because, seemingly because the authors wanted to create their own system--which is fun and educational of course--rather than use a flexible, configurable, extensively-tested, including security tested!- system.)
However, all that has to happen to make this work in general are the steps that I outlined for RMI, those requirements are essentially:
1) Objects can be serialized into some byte sequence format understood by all participants
2) When objects are sent across the wire the receiving end must have some way to obtain the classfile that defines them. This can be a) pre-installation, b) RMI's approach of "here's where to find this" or c) the sending system sends the jar. Any of these can work
3) Security should probably be maintained. In RMI, this requirement was rather "in your face", but I don't see it in Spark, so they either hid the configuration, or perhaps just fixed what it can do.
Anyway, that's not really an answer, since I described principles, with a specific example, but not the actual specific answer to your question. I'd still like to find that!

When you submit a spark application to the cluster, your code is deployed to all worker nodes, so your class and function definitions exist on all nodes.

How do I treat objects when having both RMI and Socket connections?

I am in the process of creating a game using Java. It is requested of me that the player of the game can choose to connect either through a RMI connection or a Socket one. Until now I have created all the necessary components for the game, but when it comes to creating the RMI connection, i'm having a bit of problem. From what I have read in regards of RMI all the objects used to create the connection need to be declared Remote (for example implement the Serializable interface). Seeing that I have to create both types of connections, I don't see it reasonable to serialize all the objects created so far. At this point I can think of two possible solutions:
Create a remote version of the necessary objects for the connection(for example by creating a class that extends said object and implements Serializable interface to make the object remote). After doing that, I can define the methods applicable to the remote objects that can be invoked by the clients.
Create this new type of remote objects that are just messages that take the requests from the client and "translate" them to the non remote objects and then proceed to do what was requested.
I am new to Java and I would appreciate your time and patience on this question.

From what I have read in regards of RMI all the objects used to create the connection need to be declared Remote (for example implement the Serializable interface).
You didn't read that anywhere. It doesn't even makes sense. Implementing Remote doesn't make an object Serializable. You have to
Design a remote interface that extends Remote.
Ensure that every object that will be passed or returned via this interface implements Serializable or, in rare cases, a remote interface.
Write an implementation of the interface, that typically extends UnicastRemoteObject.
If you have any remote objects at (2), repeat.
Seeing that I have to create both types of connections, I don't see it reasonable to serialize all the objects created so far.
You don't have any choice about (2), although that is unlikely to include all the objects created so far. In any case you would already have had to do it for objects you were planning to send over a socket.
At this point I can think of two possible solutions:
Create a remote version of the necessary objects for the connection(for example by creating a class that extends said object and implements Serializable interface to make the object remote).
Again this is just nonsense.
After doing that, I can define the methods applicable to the remote objects that can be invoked by the clients.
That corresponds to my step 1.
Create this new type of remote objects that are just messages that take the requests from the client and "translate" them to the non remote objects and then proceed to do what was requested.
This also is nonsense.

Spring REST representation class

I'm reading the two introductory articles about bulding and consuming Spring Rest web services.
What's weird - they're creating a Greeting representation class in the client app (second link ref) for storing the GET response (the greetingmethod on server side returns a Greeting object). But the Greeting classes on the server and client side are different classes - well, they are two distinct classes with identical names, identical field names and types (client's doesn't have a constructor).
Does it mean I have to similarly rewrite the class from stratch when building the client app? In order to do that, I'd need specs on what are the fields' types of JSON-packed objects passed by server's app. A server serializes the object of class ABCClass to JSON and sends it to client. Even if some field called 'abc' has value 10, it doesn't make it an integer. Next time it might contain a string.
My question is - how much information from server app's devs do I need in order to create a client application? How is it usually done?

It all depends on your deserializer and on your needs. With Jackson for example you might use mixins (wiki ref) and custom deserializers (wiki ref) that build your object with your required field names and your structure.
Its just simplest way to have same field names and structure, but not the only one.
Of course, however, you should know the server reply structure to deserialize it anyway

How to prevent client from seeing internal private classes in Android library ?

I have a library with several packages-
lets say
package a;
package b;
inside package a I have public a_class
inside package b I have public b_class
a_class uses b_class.
I need to generate a library from this , but I do not want the Client to see b_class.
The only solution I know of is to flatten my beautifully understandable packages to single package and to use default package access for b_class.
Is there another way to do so ? maybe using interfaces or some form of design pattern ??

If you reject to move the code to an individual, controlled server, all you can do is to hinder the client programmer when trying to use your APIs. Let's begin applying good practices to your design:
Let your packages organized as they are now.
For every class you want to "hide":
Make it non-public.
Extract its public API to a new, public interface:
public interface MyInterface {...}
Create a public factory class to get an object of that interface type.
public class MyFactory
{
public MyInterface createObject();
}
So far, you have now your packages loosely coupled, and the implementation classes are now private (as good practices preach, and you already said). Still, they are yet available through the interfaces and factories.
So, how can you avoid that "stranger" clients execute your private APIs? What comes next is a creative, a little complicated, yet valid solution, based on hindering the client programmers:
Modify your factory classes: Add to every factory method a new parameter:
public class MyFactory
{
public MyInterface createObject(Macguffin parameter);
}
So, what is Macguffin? It is a new interface you must define in your application, with at least one method:
public interface Macguffin
{
public String dummyMethod();
}
But do not provide any usable implementation of this interface. In every place of your code you need to provide a Macguffin object, create it through an anonymous class:
MyFactory.getObject(new Macguffin(){
public String dummyMethod(){
return "x";
}
});
Or, even more advanced, through a dynamic proxy object, so no ".class" file of this implementation would be found even if the client programmer dares to decompile the code.
What do you get from this? Basically is to dissuade the programmer from using a factory which requires an unknown, undocumented, ununderstandable object. The factory classes should just care not to receive a null object, and to invoke the dummy method and check the return value it is not null either (or, if you want a higher security level, add an undocumented secret-key-rule).
So this solution relies upon a subtle obfuscation of your API, to discourage the client programmer to use it directly. The more obscure the names of the Macguffin interface and its methods, the better.

I need to generate a library from this , but I do not want the Client to see b_class. The only solution I know of is to flatten my beautifully understandable packages to single package and to use default package access for b_class. Is there another way to do so ?
Yes, make b_class package-private (default access) and instantiate it via reflection for use in a_class.
Since you know the full class name, reflectively load the class:
Class<?> clz = Class.forName("b.b_class")
Find the constructor you want to invoke:
Constructor<?> con = clz.getDeclaredConstructor();
Allow yourself to invoke the constructor by making it accessible:
con.setAccessible(true);
Invoke the constructor to obtain your b_class instance:
Object o = con.newInstance();
Hurrah, now you have an instance of b_class. However, you can't call b_class's methods on an instance of Object, so you have two options:
Use reflection to invoke b_class's methods (not much fun, but easy enough and may be ok if you only have a few methods with few parameters).
Have b_class implement an interface that you don't mind the client seeing and cast your instance of b_class to that interface (reading between the lines I suspect you may already have such an interface?).
You'll definitely want to go with option 2 to minimise your pain unless it gets you back to square one again (polluting the namespace with types you don't want to expose the client to).
For full disclosure, two notes:
1) There is a (small) overhead to using reflection vs direct instantiation and invocation. If you cast to an interface you'll only pay the cost of reflection on the instantiation. In any case it likely isn't a problem unless you make hundreds of thousands of invocations in a tight loop.
2) There is nothing to stop a determined client from finding out the class name and doing the same thing, but if I understand your motivation correctly you just want expose a clean API, so this isn't really a worry.

When using Kotlin, you can use the internal modifier for your library classes.

If I understand correctly you are asking about publishing your library for 3rd party usage without disclosing part of your source? If that's the case you can use proguard, which can obfuscate your library. By default everything will be excluded/obfuscated, unless you specify things you want to exclude from being obfuscated/excluded.

If you want to distribute [part of] your code without the client being able to access it at all, that means that the client won't be able to execute it either. :-O
Thus, you just have one option: Put the sensible part of your code into a public server and distribute a proxy to access it, so that your code would be kept and executed into your server and the client would still be able to execute it through the proxy but without accessing it directly.
You might use a servlet, a webservice, a RMI object, or a simple TCP server, depending on the complexity level of your code.
This is the safest approach I can think of, but it also deserves a price to pay: In addition to complexing your system, it would introduce a network delay for each remote operation, which might be big deal depending on the performance requirements. Also, you should securize the server itself, to avoid hacker intrussions. This could be a good solution if you already have a server that you could take advantage of.

RMI: pass non-remote object classes to a server

Suppose I have a remote class that has a method with a POJO parameter:
class MyRemote implements Remote {
void service(Param param) throws RemoteException;
}
The client retrieves a stub and does:
// DerivedParam is defined by the client
// and is derived from Param
DerivedParam dparam = getDerivedParam();
myService.service(dparam);
It fails, because the server has no clue about the DerivedParam class (and interfaces that it possibly implements).
The question: is it somehow possible to pass those classes from client to server to make such an invocation possible?

I am no an expert on the subject, but I did pulled this trick some time ago. The magic is in the use of code mobility by means of setting the java.rmi.server.codebase property.
You make this property point to a URL or space-separated list of URLs where your shared classes may reside. This could be, for instance, an FTP server or an HTTP server where a jar file with common classes reside.
Once set up, the codebase annotation will be included in all objects marshalled by server and client, and when either party cannot find a class, they look it up in the URLs provided in the code base and would dynamically load it.
Please read Dynamic Code Downloading with Java RMI.
Let's say you provide to your client only interfaces, and the implementations will be located in a given code base. Then the client requests the server to send a given object, the client expects to receive an object that implements a given interface, but the actual implementation is unknown to the client, when it deserializes the sent object is when it has to go to the code base and download the corresponding implementing class for the actual object being passed.
This will make the client very thin, and you will very easily update your classes in the code base without having to resort to updating every single client.
Let's say you have a RMI server with the following interface
public interface MiddleEarth {
public List<Creature> getAllCreatures();
}
The client will only have the interfaces for MiddleEarth and Creature, but none of the implementations in the class path.
Where the implementations of Creature are serializable objects of type Elf, Man, Dwarf and Hobbit. And these implementations are located in your code base, but not in your client's class path.
When you ask your RMI server to send you the list of all creatures in Middle Earth, it will send objects that implement Creature, that is, any of the classes listed above.
When the client receives the serialized objects it has to look for the class files in order to deserialized them, but these are not located in the local class path. Every object in this stream comes tagged with the given code base that can be used to look for missing classes. Therefore, the client resort to the code base to look for these classes. There it will find the actual creature classes being used.
The code base works in both directions, so it means that if you send your server a Creature (i.e. an Ent) it will look for it in the code base as well.
This means that when both, client and server need to publish new types of creatures all they have to do is to update the creaturesImpl.jar in the code base, and nothing in the server or client applications themselves.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.