I am trying to open a Java class file, instrument the bytecode and save the class file before the class is loaded into the JVM. My problem is that a class is being loaded "too soon" into the JVM. The bytecode is instrumented after the class is loaded into the JVM.
-verbose:class prints when each class is loaded but it doesn't tell me what triggered the JVM to load the class. How do I get a call stack which shows the class being loaded?
Putting a breakpoint in the following code, shows the call stack when the class is initialized and not loaded.
static
{
System.out.println("Initialized!");
}
Note: I know I could use a Java agent to do this and guarantee the bytecode is instrumented. But, I choose this route for various reasons.
I opened java.lang.ClassLoader and set a conditional breakpoint in loadClass(String name, boolean resolve). The condition is arg0.endsWith("MyClass") where arg0 is the name parameter. When the breakpoint is triggered, the IDE displays the call stack. Several frames down on the call stack shows me why the class is being loaded.
Note: This condition works in Eclipse IDE and may need a little tweaking in other IDEs.
On the Hotspot JVM, which is what is used with the the Oracle and OpenJDK java distributions, classes are loaded when they are first referenced. Here are the relevant snippets from https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-5.html
The Java Virtual Machine starts up by creating an initial class, which is specified in an implementation-dependent manner, using the bootstrap class loader (§5.3.1). The Java Virtual Machine then links the initial class, initializes it, and invokes the public class method void main(String[]). The invocation of this method drives all further execution. Execution of the Java Virtual Machine instructions constituting the main method may cause linking (and consequently creation) of additional classes and interfaces, as well as invocation of additional methods.
Creation of a class or interface C denoted by the name N consists of the construction in the method area of the Java Virtual Machine (§2.5.4) of an implementation-specific internal representation of C. Class or interface creation is triggered by another class or interface D, which references C through its run-time constant pool. Class or interface creation may also be triggered by D invoking methods in certain Java SE platform class libraries (§2.12) such as reflection.
If you wish to actually perform the bytecode manipulation yourself, and at runtime, then there are at least two models that you may choose to follow
Use something akin to AspectJ's load time weaving, which involves using a separate classloader and an agent to respond to class loading
How JRebel does it, via monitoring known .class files for updated timestamps http://zeroturnaround.com/software/jrebel/learn/faq/
Unless you have a very strong need to do this yourself however, consider using one of the above mentioned tools. Better yet, determine if Java annotations and reflection can solve what you are attempting to do instead.
Related
I'm interested in the subject of language interop.
For the Kotlin/JVM target we're able to call Java code directly without having to define any binding interfaces or using tools such as the SWIG library, JNI and others. How was that achieved ?
Short answer: the Java runtime handles it all; no bindings are needed.
All classes and interfaces that the JVM runs (except for native calls using JNI or similar) are provided to it as Java bytecode (typically in the form of .class files); it's the same whether that bytecode was compiled from Java, or Kotlin, or Groovy, or Scala, or any other JVM language.
The bytecode contains details of all the constructors, fields, and methods of a class or interface. When that refers to other classes, it does so via their fully-qualified names (e.g. java.lang.String) — and the JVM knows how to find (e.g. by searching the classpath) and load any class given its fully-qualified name. (Specifically, it uses a ClassLoader — usually the system one, though custom ones can be used where appropriate.) Having loaded the class, instances can be constructed and their methods called directly.
So the JVM doesn't need any secondary means of identifying or accessing classes/interfaces or their methods; it's all specified in the bytecode and accessible directly.
(If you want more details, the Java Virtual Machine Specification is probably the ultimate reference.)
If you want to load multiple versions of a class, you can do so if they implement a shared interface and are in separate JARs, using a separate class loader for each version.
If you have a JAR that calls native code, you can store the shared library (DLL) for the native code in its JAR by extracting the shared library to a temporary file and then using System.load to load the library from the temporary file.
But if you do both, will it work? What happens if both versions of the JAR call native code, and both contain a different version of the shared library?
Let us assume that both JARs use a different temporary file to store the copy of the shared library. But the two versions of the shared library have native code that call native (C) functions that have identical declarations (but the implementations of those functions are different). Will the JVM/class loader/System.load delegate from the Java code to the correct native code? Or will the JVM complain about name conflicts?
If that scheme does fail, how do I use multiple versions of a class that uses native code?
Examining the Open JDK 7 implementation, it seems that, yes, loading multiple versions of Java classes that use native code will work:
Library Loading
Crucial information is, how does System.load behave? The implementation of that method will be system dependent, but the semantics of the various implementations should be the same.
System.load delegates to the package-private method Runtime.load0.
Runtime.load0 delegates to the package-private static method ClassLoader.loadLibrary.
ClassLoader.loadLibrary delegates to the private static method ClassLoader.loadLibrary0.
ClassLoader.loadLibrary0 creates an object of the package-private inner class ClassLoader.NativeLibrary and delegates to its load method.
ClassLoader.NativeLibrary.load is a native method, which delegates to the function JVM_LoadLibrary.
JVM_LoadLibrary delegates to os::dll_load.
os::dll_load is system dependent.
The Linux variant of os::dll_load delegates to the dlopen system call, giving the RTLD_LAZY option.
The Linux variant of the POSIX dlopen system call has RTLD_LOCAL behaviour by default, so the shared library is loaded with RTLD_LOCAL semantics.
RTLD_LOCAL semantics are that the symbols in the loaded library are not made available for (automatic) symbol resolution of subsequently loaded libraries. That is, the symbols do not enter the global namespace, and different libraries may define the same symbols without generating conflicts. The shared libraries could even have identical content without problems.
Hence it does not matter if different shared libraries, loaded by different class loaders, define the same symbols (have the same names of extern functions for native methods): the JRE and JVM together avoid name clashes.
Native Function Lookup
That ensures that the multiple versions of the shared libraries do not generate name conflicts. But how does OpenJDK ensure that the correct JNI code is used for the native method calls?
The procedure followed by the JVM to call a native method is rather lengthy, but it is all contained within one function, SharedRuntime::generate_native_wrapper. Ultimately, however, that needs to know the address of the JNI function to be called.
That wrapper function makes use of a methodHandle C++ object, getting the address of the JNI function from either the methodHandle::critical_native_function() or methodHandle::native_function(), as appropriate.
The address of the JNI function is recorded in the methodHandle by a call to methodHandle::set_native_function from NativeLookup::lookup.
NativeLookup::lookup delegates, indirectly, to NativeLookup::lookup_style
NativeLookup::lookup_style delegates to the Java package-private static method ClassLoader.findNative.
ClassLoader.findNative iterates through the list (ClassLoader.nativeLibraries) of ClassLoader.NativeLibrary objects set up by ClassLoader.loadLibrary0, in the order that the libraries were loaded. For each library, it delegates to NativeLibrary.find to try to find the native method of interest. Although this list of objects is not public, the JNI specification requires that the JVM "maintains a list of loaded native libraries for each class loader", so all implementations must have something similar to this list.
NativeLibrary.find is a native method. It simply delegates to JVM_FindLibraryEntry.
JVM_FindLibraryEntry delegates to the system dependent method os::dll_lookup.
The Linux implementation of os::dll_lookup delegates to the dlsym system call to lookup the address of the function in the shared library.
Because each class-loader maintains its own list of loaded libraries, it is guaranteed that the JNI code called for a native method will be the correct version, even if a different class loader loads a different version of the shared library.
If you try to load the same library in different class loaders you will get an UnsatisfiedLinkError with the message "Native Library: ... already loaded in another class loader". This might have something to do with the VM calling the library's unload method when the class loader is garbage collected (https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#compiling_loading_and_linking_native_methods).
But if you - as you say - "use a different temporary file to store the copy of the shared library" the two are effectively different libraries regardless of the files' contents (might be binary identical, doesn't matter). So there isn't a problem.
I want to redefine the bytecode of the StackOverflowError constructor so I have a "hook" for when a stack overflow occurs. All I want to do is insert a single method call to a static method of my choosing at the start of the constructor. Is it possible to do this?
You should be able to do it using one of two ways (unless something changed in the last 1-2 years, in which case I'd love some links to changelogs/docs):
Mentioned in a comment, not very feasible I guess, modify the classes you are interested in, put them in a jar and then use the -bootclasspath option to load them instead of the default ones. As was mentioned before this can have some legal issues (and is a pain to do in general).
You should be able to (or at least you used to be able to) instrument almost all core classes (iirc Class was the only exception I've seen). One of many problems you might have is the fact that many of core classes are being initialized before the agents you provide (or well their premain methods to be exact) are consulted. To overcome this you will have to add Can-Retransform-Classes property to your agent jar and then re-transform the classes you are interested in. Be aware that re-transformation is a bit less powerful and doesn't give you all the options you'd have normally with instrumentation, you can read more about it in the doc.
I am assuming you know how to do instrumentation?
There are several things to consider.
It is possible to redefine java.lang.StackOverflowError. I tried it successfully on 1.7.0_40. isModifiableClass(java.lang.StackOverflowError.class) return true and I successfully redefined it inserting a method invocation into all of its constructors
You should be aware that when you insert a method call into a class via Instrumentation you still have to obey the visibility imposed by the ClassLoader relationships. Since StackOverflowError is loaded by the bootstrap loader it can only invoke methods of classes loaded by the bootstrap loader. You would have to add the target method’s class(es) to the bootstrap loader
This works if the application’s code throws a StackOverflowError manually. However, when a real stackoverflow occurs, the last thing the JVM will do is to invoke additional methods (keep in mind what the error says, the stack is full). Consequently it creates an instance of StackOverflowError without calling its constructor (a JVM can do that). So your instrumentation is pointless in this situation.
As already pointed out by others, a “Pure Java Application” must not rely on modified JRE classes. It is only valid to use Instrumentation as add-on, i.e. development or JVM management tool. You should keep in mind that the fact that Oracle’s JVM 1.7.0_40 supports the redefinition of StackOverflowError does not imply that other versions or other JVMs do as well.
In HotSpot JVM java.lang.Classloader class has a Vector of all classes loaded by this classloader. And so all classes are held in memory as long as their's classloader is alive. In IBM JVM J9's java.lang.Classloader there is no such field. At least I was unable to find one. So my questions are:
Where does IBM JVM's classloaders hold class cache?
If differs from the point above: what hard-references classes in IBM JVM, thus preventing from unloading?
Looking at the code of my IBM JVM it seems that java.lang.ClassLoader is an abstract class, so it will be implemented somewhere. Using the debugger I found that is a synthetic class called sun.misc.Launcher$AppClassLoader.
Then, to retrieve a class there is a native method
private native Class findLoadedClassImpl(String className);
so it seems the caching is done outside Java, in a native method.
At the beginning of loadClass method I see:
// Ask the VM to look in its cache.
Class loadedClass = findLoadedClass(className);
then it checks whether loadedClass is null, and if so tries to use the parent clasloader.
So, I'd say that, unless the method is overridden by the inheriting classloader, caching happens outside Java, in some native component of IBM VM.
The IBM J9 JVM has no PermGen on Heap and stores Classes in native memory. You can use -Xdump to generate a javacore.* file, it will contain a list of all classloaders and classes.
BTW: Java8 will do a similiar thing.
My javaagent, run via -javagent, instruments classes with callbacks to static methods on one of my classes. This works great, apart from on system classes, e.g. java/lang, java/util, which throw ClassDefNotFounderror at the point when the method is called (with INVOKESTATIC). So it appears they are instrumented, because the method call is attempted, but have an access or visibility issue that my user classes don't have. My callback class and its methods are all public.
I've tried adding my class to the classpath (instead of just loading via -javaagent) but that didn't help. Is there some protection of system classes I need to override?
It sounds like you're explicitly looking for classes to instrument. Why aren't you using java.lang.instrument to intercept classes that are being loaded when the target JVM executes? See this example
Can you paste your code, or the relevant parts?