How does Kotlin generates bindings for the JVM automatically?

How does Kotlin generates bindings for the JVM automatically? - java

I'm interested in the subject of language interop.
For the Kotlin/JVM target we're able to call Java code directly without having to define any binding interfaces or using tools such as the SWIG library, JNI and others. How was that achieved ?

Short answer: the Java runtime handles it all; no bindings are needed.
All classes and interfaces that the JVM runs (except for native calls using JNI or similar) are provided to it as Java bytecode (typically in the form of .class files); it's the same whether that bytecode was compiled from Java, or Kotlin, or Groovy, or Scala, or any other JVM language.
The bytecode contains details of all the constructors, fields, and methods of a class or interface.  When that refers to other classes, it does so via their fully-qualified names (e.g. java.lang.String) — and the JVM knows how to find (e.g. by searching the classpath) and load any class given its fully-qualified name.  (Specifically, it uses a ClassLoader — usually the system one, though custom ones can be used where appropriate.)  Having loaded the class, instances can be constructed and their methods called directly.
So the JVM doesn't need any secondary means of identifying or accessing classes/interfaces or their methods; it's all specified in the bytecode and accessible directly.
(If you want more details, the Java Virtual Machine Specification is probably the ultimate reference.)

Related

How can you tell if a JDK class is implemented natively instead of in java code?

For example, when exploring source classes in IntelliJ, and you find a class like Constructor in java.lang.reflect...The parameters to all the methods are var0,var1,var2,...varN. But other than that, how would one know it is implemented natively?
EDIT: The correct question really is how to tell whether a method is native in Java, because, as pointed out by responders, classes themselves can't be "native."

In general, classes themselves are never native. Only methods can have native implementation.
But that doesn't mean the JVM can't treat some classes in special way. But you shouldn't be able to tell the difference. The JVM treats them differently as an optimization (for example String class) or because doing otherwise would be very inconvenient (for example NoClassDefFoundError is treated in a special way, and can be thrown and printed even if the class doesn't exist on your classpath. You can try it by moving/renaming rt.jar and running java).
As for the constructor arguments being named varN, it's likely because you didn't actually install JDK source, and whatever IDE you use shows you decompiled java code, which doesn't have variable names preserved. In the case of Constructor class, the constructor is likely to be actually implemented in java, but called, and the object returned, from native code.
A method is native if it has native modifier. But there are also methods that JVM treats as intrinsics. The only way to know those, together with special classes, for sure is to look at JVM source. There is also a very thin line between what can be considered JVM optimization and what can be considered intrinsic, and because you don't normally modify JVM code, so unless you are JDK developer, it doesn't really matter. This is also an implementation detail that can change in different JVM or different JVM version. In fact, there are attempts at writing JVMs completely in java, without any real native code, aside of some tiny startup wrapper. In those cases, everything is in java. There is no native code.

Migrating C++ project to Java, protecting implementation details

I have a quite complex project to migrate from C++ (Linux) to Java
Currently, the C++ version is being distributed as a shared library (.so) followed by top-level interface header class. The implementation details are fully hidden from the final user.
This question is not about porting the C++ code to Java, but rather about creating similar distribution package.
Let's assume I have a very simple 'public' class in C++, topapi.h:
class TopApi
{
public:
void do( const string& v );
}
The actual implementation is hidden from the API user. The actual project may contain another 100 files/classes do() will call.
The distribution will contain 2 files: topapi.so and topapi.h
Users will #include "topapi.h" in their code, and link their applications with topapi.so.
The questions are:
1. How can I achieve a similar effect in Java (hide the IP related code)
2. How do I show public methods to the user ( not related to code protection, just a java version of the header file above )

Check out proguard. It will at least obfuscate the jar file, which otherwise is basically human readable. It's not absolutely safe from reverse engineering, but I guess neither is an so file.
I'm not an expert with Java, but this is what we have done to protect implementations in the past.
I don't know exactly what the motivations are for a Java port, but if it is just to support a Java end user, you could consider a JNI wrapper. I guess this probably isn't the case, but I thought I would mention it.
As far as exposing interface code to the user, you could write a Java interface class (like a pure virtual abstract c++ class) and simply not proguard that class.

To answer the question of how to show public methods to the user. This is usually done through a combination of declaring the internal classes without an access modifier, which makes them only accessible from within the same package, and not documenting them. Don't depend on the former though, it's easily circumventable, but it sends the message to the user that those classes are internal.
Java 9 adds modules which allow you to encapsulate entire packages, but it's not here yet, and you would still be able to circumvent the encapsulation.
One side effect of ahead of time compilation (usually the case with C++) is that the distributed code is already optimized, and contains no metadata, so it's harder to reverse engineer. Java is distributed in an intermediate language, but the actual machine code is generated at runtime (JIT compilation). The intermediate language is practically un-optimized, so it's easier to reverse engineer. Java also merges the idea of header files and source files where a .class file will contain all the metadata you need to use it.

Loading multiple versions of Java classes that use native code

If you want to load multiple versions of a class, you can do so if they implement a shared interface and are in separate JARs, using a separate class loader for each version.
If you have a JAR that calls native code, you can store the shared library (DLL) for the native code in its JAR by extracting the shared library to a temporary file and then using System.load to load the library from the temporary file.
But if you do both, will it work? What happens if both versions of the JAR call native code, and both contain a different version of the shared library?
Let us assume that both JARs use a different temporary file to store the copy of the shared library. But the two versions of the shared library have native code that call native (C) functions that have identical declarations (but the implementations of those functions are different). Will the JVM/class loader/System.load delegate from the Java code to the correct native code? Or will the JVM complain about name conflicts?
If that scheme does fail, how do I use multiple versions of a class that uses native code?

Examining the Open JDK 7 implementation, it seems that, yes, loading multiple versions of Java classes that use native code will work:
Library Loading
Crucial information is, how does System.load behave? The implementation of that method will be system dependent, but the semantics of the various implementations should be the same.
System.load delegates to the package-private method Runtime.load0.
Runtime.load0 delegates to the package-private static method ClassLoader.loadLibrary.
ClassLoader.loadLibrary delegates to the private static method ClassLoader.loadLibrary0.
ClassLoader.loadLibrary0 creates an object of the package-private inner class ClassLoader.NativeLibrary and delegates to its load method.
ClassLoader.NativeLibrary.load is a native method, which delegates to the function JVM_LoadLibrary.
JVM_LoadLibrary delegates to os::dll_load.
os::dll_load is system dependent.
The Linux variant of os::dll_load delegates to the dlopen system call, giving the RTLD_LAZY option.
The Linux variant of the POSIX dlopen system call has RTLD_LOCAL behaviour by default, so the shared library is loaded with RTLD_LOCAL semantics.
RTLD_LOCAL semantics are that the symbols in the loaded library are not made available for (automatic) symbol resolution of subsequently loaded libraries. That is, the symbols do not enter the global namespace, and different libraries may define the same symbols without generating conflicts. The shared libraries could even have identical content without problems.
Hence it does not matter if different shared libraries, loaded by different class loaders, define the same symbols (have the same names of extern functions for native methods): the JRE and JVM together avoid name clashes.
Native Function Lookup
That ensures that the multiple versions of the shared libraries do not generate name conflicts. But how does OpenJDK ensure that the correct JNI code is used for the native method calls?
The procedure followed by the JVM to call a native method is rather lengthy, but it is all contained within one function, SharedRuntime::generate_native_wrapper. Ultimately, however, that needs to know the address of the JNI function to be called.
That wrapper function makes use of a methodHandle C++ object, getting the address of the JNI function from either the methodHandle::critical_native_function() or methodHandle::native_function(), as appropriate.
The address of the JNI function is recorded in the methodHandle by a call to methodHandle::set_native_function from NativeLookup::lookup.
NativeLookup::lookup delegates, indirectly, to NativeLookup::lookup_style
NativeLookup::lookup_style delegates to the Java package-private static method ClassLoader.findNative.
ClassLoader.findNative iterates through the list (ClassLoader.nativeLibraries) of ClassLoader.NativeLibrary objects set up by ClassLoader.loadLibrary0, in the order that the libraries were loaded. For each library, it delegates to NativeLibrary.find to try to find the native method of interest. Although this list of objects is not public, the JNI specification requires that the JVM "maintains a list of loaded native libraries for each class loader", so all implementations must have something similar to this list.
NativeLibrary.find is a native method. It simply delegates to JVM_FindLibraryEntry.
JVM_FindLibraryEntry delegates to the system dependent method os::dll_lookup.
The Linux implementation of os::dll_lookup delegates to the dlsym system call to lookup the address of the function in the shared library.
Because each class-loader maintains its own list of loaded libraries, it is guaranteed that the JNI code called for a native method will be the correct version, even if a different class loader loads a different version of the shared library.

If you try to load the same library in different class loaders you will get an UnsatisfiedLinkError with the message "Native Library: ... already loaded in another class loader". This might have something to do with the VM calling the library's unload method when the class loader is garbage collected (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#compiling_loading_and_linking_native_methods).
But if you - as you say - "use a different temporary file to store the copy of the shared library" the two are effectively different libraries regardless of the files' contents (might be binary identical, doesn't matter). So there isn't a problem.

What is Causing a Java Class to be Loaded?

I am trying to open a Java class file, instrument the bytecode and save the class file before the class is loaded into the JVM. My problem is that a class is being loaded "too soon" into the JVM. The bytecode is instrumented after the class is loaded into the JVM.
-verbose:class prints when each class is loaded but it doesn't tell me what triggered the JVM to load the class. How do I get a call stack which shows the class being loaded?
Putting a breakpoint in the following code, shows the call stack when the class is initialized and not loaded.
static
{
System.out.println("Initialized!");
}
Note: I know I could use a Java agent to do this and guarantee the bytecode is instrumented. But, I choose this route for various reasons.

I opened java.lang.ClassLoader and set a conditional breakpoint in loadClass(String name, boolean resolve). The condition is arg0.endsWith("MyClass") where arg0 is the name parameter. When the breakpoint is triggered, the IDE displays the call stack. Several frames down on the call stack shows me why the class is being loaded.
Note: This condition works in Eclipse IDE and may need a little tweaking in other IDEs.

On the Hotspot JVM, which is what is used with the the Oracle and OpenJDK java distributions, classes are loaded when they are first referenced. Here are the relevant snippets from https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-5.html
The Java Virtual Machine starts up by creating an initial class, which is specified in an implementation-dependent manner, using the bootstrap class loader (§5.3.1). The Java Virtual Machine then links the initial class, initializes it, and invokes the public class method void main(String[]). The invocation of this method drives all further execution. Execution of the Java Virtual Machine instructions constituting the main method may cause linking (and consequently creation) of additional classes and interfaces, as well as invocation of additional methods.
Creation of a class or interface C denoted by the name N consists of the construction in the method area of the Java Virtual Machine (§2.5.4) of an implementation-specific internal representation of C. Class or interface creation is triggered by another class or interface D, which references C through its run-time constant pool. Class or interface creation may also be triggered by D invoking methods in certain Java SE platform class libraries (§2.12) such as reflection.
If you wish to actually perform the bytecode manipulation yourself, and at runtime, then there are at least two models that you may choose to follow
Use something akin to AspectJ's load time weaving, which involves using a separate classloader and an agent to respond to class loading
How JRebel does it, via monitoring known .class files for updated timestamps http://zeroturnaround.com/software/jrebel/learn/faq/
Unless you have a very strong need to do this yourself however, consider using one of the above mentioned tools. Better yet, determine if Java annotations and reflection can solve what you are attempting to do instead.

Modify already loaded class with Java agent?

Currently I'm trying to modify method bodies residing in classes already loaded by the JVM. I'm aware of the JVM actually not allowing to change the definition of classes that have already been loaded. But my researches brought me to implementations like JRebel or Java Instrumentation API, both using an agent-based approach. I know how to do it right before a class is loaded o behalf of Javassist. But considering e.g. JRebel in an EJB environment where class definitions are loaded on application startup, shouldn't bytecode modification be possible on JVM-loaded classes?

Well, you learned that the Instrumentation API exists and it offers redefinition of classes as an operation. So then it is time to rethink you initial premise of “the JVM actually not allowing to change the definition of classes that have already been loaded”.
You should note that
as the links show, the Instrumentation API is part of the standard API
the support of redefinition of classes is, however, optional. You may ask whether the current JVM supports this feature
it might be limited to not support every class; you may ask whether it’s possible for a particular class
Even if it is supported, the changes may be limited, to cite the documentation:
The redefinition may change method bodies, the constant pool and attributes. The redefinition must not add, remove or rename fields or methods, change the signatures of methods, or change inheritance. These restrictions maybe be lifted in future versions.
at the time you perform the redefinition, there might be threads executing code of methods of these classes; these executions will complete using the old code then
So the Instrumentation is merely useful for debugging and profiling, etc.
But other frameworks, like EJB containers, offering class reloading in production code, usually resort to creating new ClassLoaders which create different versions of the classes which then are entirely independent to the older versions.
In a Java runtime environment, the identity of a class consists of a pair of <ClassLoader, Qualified Name> rather than just a qualified name…

I wasn't aware that you can use the instrumentation API to redefine classes (see #Holger's answer). However, as he notes, there are some significant limitations on that approach. Furthermore, the javadoc says:
"This method is intended for use in instrumentation, as described in the class specification."
Using it to materially change the semantics of a class is ... all sorts of bad from the perspective of the Java type system.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.