Java ClassLoader and Dependency Resolution - java

Can someone clarify that the role of a ClassLoader is not only to load an individual class, but also its dependencies? And if so, what exactly does the entire process entail? I'm looking for implementation detail if at all possible.
For example, at some point, bytes are going to have to be read in from somewhere (a network or filesystem location), and file system locations are going to have to be calculated on the basis of a classes canonical name and a foreknowledge of class paths available to the JVM- how does an individual ClassLoader try to locate a file over potentially multiple class-paths? Where does it get this information from? Also, at what point are a class files bytes verified and its dependencies examined for availability?
As much detail as possible would be appreciated :)

ClassLoading is a very complex subject. The ClassLoader and Java security model are inextricably tied together. Essentially the JVM loads classes on demand. When there is a hierarchy of classloaders, the JVM attempts to resolve the class as far down the chain as possible. In short, if the class is defined in the "boot" classloader and in an application defined class loader, it will always use the version in the boot classloader.
Within a classloader, such as the URLClassLoader, the search order is the order in which you've told it to look. Essentially the array of URLs you told it had classes will be searched from the first entry to the last.
When the class that you defined references another class, that class is also resolved using the same algorithm. But here's the catch: it only resolves it relative to where it was found. Let's take the scenario where the class SomeCoolThing is in the boot classloader, but depends on SomeLameThing, which is in an application defined classloader. The process would look like this:
App-ClassLoader: resolveClass("SomeCoolThing")
parent->resolveClass("SomeCoolThing")
Boot-ClassLoader (the ultimate parent): resolveClass("SomeCoolThing")
SomeCoolThing needs SomeLameThing
resolveClass("SomeLameThing") // Can't find SomeLameThing!!!!
Even though SomeLameThing is in the classloader where you requested SomeCoolThing, SomeCoolThing was resolved in a different classloader. That other classloader has no knowledge of the child classloader, and tries to resolve it itself and fails.
I had a book a long time ago that covered the Java ClassLoaders in really good depth, and I recommend it. It's Java Security by O'Reilly Media. It will answer every question you never wanted to know, but still need to, when dealing with ClassLoaders and how they work.

I can answer some of your questions:
how does an individual ClassLoader try
to locate a file over potentially
multiple class-paths?
If you mean different class loaders have different classpaths then each class loader takes the properties (i.e. classpath) of the parent class loader. All things equal each class loader has the same classpath as any other (I believe; not sure if the JVM does anything weird internally). So MyClass.class is the same for a class loader and all child class loaders. If you have multiple MyClass.class defined on the same class path then the JVM picks up the first one. In the past I've created my own class loader and prepended a custom classpath onto the existing classpath to load classes at runtime that were not on the classpath when launched.
The get to the nuts and bolts of it I'm sure there is a spec out there that describes this or you could download the JVM code (the assembly/C/C++ code) and go though that but I've had to do that and "it ain't pretty".
Of course "they" are changing the classpath stuff in 1.7 so I'm not sure how that is going to work...
Hope that helps a bit...

Related

Custom classloader trouble with getResources for names ENDING in slash

I am desperate for help but was unable to find anything on the web about this particular subject (many related ones that leave my particular problem unanswered).
Specifically, I need to be able to download code (jars) from a central and external code repository. This is done by the bootstrap code that needs to add this to the classpath of a class loader to be used thereafter. This is when we enter the subject that has been discussed so many times. I don't like hacks, so I tried the following:
Attempt #1: Create an instance of URLClassLoader configured for this purpose, then invoke the "rest" of the code through it.
Failure: There are 1.5 problems here (one may be the cause of another). One is that URLClassLoader, normally, prefers to load stuff from its parent. Some code has to exist in both, possibly different versions. If the code from the parent is used, it continues using the "outer" class loader for the rest of loading, which is not what we want, even when the initial loading is OK. Secondly, some third party libraries seem to access the system class loader directly, either by design or accidentally (may get it from one of the classes loaded by it).
Attempt #2: Create my subclass of the URLClassLoader that prefers self over the parent. Overrode loadClass, getResource, getResources, getPackage, getPackages... later other methods too to make sure of this.
Failure: Didn't help (enough). That third party code still couldn't load some resources.
Attempt #3 Create another custom subclass of the URLClassLoader and set it as the system class loader using -Djava.system.class.loader=...
Failure: This worked better - went further, but still failed trying to get resources. This time it was different resources, though. I added logging to all the overridden methods to log their calls and resource names. Regular resources were MOSTLY found. Some still weren't, even though they are there (confirmed). But something I don't know about even though I tried hard to learn is about many calls with resource names that end with a slash. Some also have slashes where a dollar sign would normally appear (nested/inner class resources). Some examples that were requested but NOT found:
com/acme/foo/bar/ClassName/
com/acme/foo/bar/ClassName/InnerClassName/
When I run the downloaded code with all content on the initial/boot classpath (and do not use my classloader), everything works fine - thus my class loader breaks things, but I need it to work.
My closest guesses are:
Third party code gets hold of the true system class loader somehow, perhaps via some class that was loaded by it, then uses that. I don't see requests to it and they are bound to fail because it does not have the entire class path.
This business with resource names ending in slashes is the cause by being supported by the true system class loader but not by the URLClassLoader I am subclassing. I can only guess that the expected return URL somehow locates the collection of resources with that name as prefix. That would be tough to match, although possible. Furthermore, it appears that some slashes are in positions where a dollar sign separating the inner class name should be, i.e. in the above example (spaces added for clarity):
com/acme/foo/bar/ClassName / InnerClassName/
com/acme/foo/bar/ClassName $ InnerClassName/
Please note that I cannot rely on hacking the actual system classloader by assuming that it is a subclass of the URLClassLoader and using reflection to call its addURL(URL) method.
Is there a way to make this work? Please help!
UPDATE
I just made an additional attempt. I created a dummy wrapper classloader (extending ClassLoader, not URLClassLoader) that only logs requests, then passes them on to the parent (public methods) or superclass (protected methods). I set this to be the system class loader and manually added the entire "inner" class path to the actual outer one, then tried to run the code. That works correctly, just as it does without the custom system class loader. What was logged also identified that even the system class loader return null for these resources ending in slashes for MOST of them, but not all. I did not check whether these also work in the my real code but guessing they may - as they were not the stumbling block. Somehow the custom system classloader is still being bypassed. How?
UPDATE 2
In my custom system class loaders I have let some classes come from the outer/true system class loader, e.g. those in java.lang. I am now suspecting that I should not have and that the inner "world" must be completely isolated. That would make it problematic, though, to communicate with it and all I would have left is reflection... but not sure whether that would even work - i.e. can there be more than one java.lang.Class and/or java.lang.Object?
Given all constraints this does not appear entirely possible in a rock solid fashion as I wanted it:
a) Third party libraries may always "misbehave" and get hold of lassloaders they are not supposed to use one way or another. I looked at OneJar as suggested by fge but they have the same issue - they only detect a possibility of it.
b) There is no way to completely replace/hide the system class loader.
c) Casting the system class loader to a URLClassLoader may stop working at any moment.
It seems, you didn’t understand the class loader structure. Within an ordinary JVM of the current version, there are at least three class loaders:
The bootstrap loader which is responsible for loading the core classes. Since this involves classes like Class and ClassLoader itself, it can’t be represented by a ClassLoader instance. All classes whose getClassLoader() returns null were loaded by the bootstrap loader
The extension loader. It is responsible for loading classes within the ext/ directory of the JRE. Afaik, it may vanish in future versions. Its parent loader is the bootstrap loader
The application loader. This is the one which will be returned by ClassLoader.getSystemClassLoader() and which will be used if no other parent was specified. In the current configurations, it’s parent is the extension loader, but maybe it will have the bootstrap loader as its direct parent in future versions
The conclusion is, if you want to reload your application’s classes without the delegation to the parent loader destroying your effort, you don’t need to manipulate the class loader’s implementation. You just have to specify the right parent. It’s as simple as
URLClassLoader cl=new URLClassLoader(urls, ClassLoader.getSystemClassLoader().getParent());
That way, the new class loader’s parent will be the original application class loader’s parent, thus the already loaded application classes are not in the scope of the new loader while everything else works as usual.
Regarding the resources ending with a slash, they are rather uncommon. They may get resolved when they actually refer to a directory but that depends on the protocol of the URL and the actual handler for that protocol. E.g. it might work for file: URLs but usually doesn’t for jar: URLs unless the jar file contains pseudo-entries for directories. I’ve also seen it working for ftp: URLs.
Another thing to understand is that if one class directly refers to another class, its original defining class loader will be queried, not necessarily the application class loader. E.g. when the class java.lang.String contains a reference to java.lang.Object (it does), this reference will be directly resolved using the bootstrap loader as this is the defining loader of java.lang.String.
This implies that if you manipulate the parent lookup of a loader to not follow the standard parent delegation you are risking to resolve names to different runtime classes as the resolving of the same names when being referenced by classes loaded by the parent loader. You avoid such problems by following the standard procedure as in the solution above. The JRE classes will never contain references to your application classes and the new loader not having the original application loader as its parent will never interfere with the classes loaded by the original application loader.

Java classloader delegation

I have a question about java ClassLoaders. I am trying to use different ClassLoaders to be able to run different versions of a JAR from within the same program.
I have heard somewhere that if you load one class using one ClassLoader all classes called (being loaded) from within that class would use the same ClassLoader. Is this correct?
If not, is there a neat way to set the context of a ClassLoader (let's say, everything being called from a specific class/library should use the same ClassLoader).
This is not a simple subject and i would advise doing more research online as no answer here will be nearly in depth enough. but, as a quick synopsis:
classes loaded via normal class references (i.e. a line of code in Class A which uses a variable of static type B) will be loaded using the same classloader as the initial class.
however, due to classloader delegation, a class may not actually be loaded by the ClassLoader from which the search originally started. example, i have Class A loaded by classloader LA with parent classloader LP. Class B is referenced by A, so the search for Class B will start with LA. however, the class bytes for B are actually found in LP, so LP loads the class and hands it to LA which returns it. ultimately, however, B is owned by LP, not LA.
with utilities which load classes via reflection (e.g. serialization, JAXB, Hibernate, etc.) or frameworks which are typically used with nested classloaders (e.g. Java EE appservers), all bets are off. typically utilities/frameworks like this load classes using the context classloader, but that is not always the case. each utility may have different priorities and fallbacks regarding which classloader is used. additionally, many have ways of explicitly providing a classloader at runtime.
as a rule of thumb, while executing code which you know is from a nested classloader (probably because you set it up), you should set the current context classloader appropriately.

if I have a classloader instance, can I find where it looks for the class bytes?

if I have dira,jarb and dirc in the classpath in that order, and I have a java app with 3 classloaders with a parent/child/grandchild relationship, will they all read the same directory ?
I guess I am trying to figure out where each classloader looks... is there a way to find this path given an instance of the classloader ?
In general no, a classloader is permitted to construct bytes however it likes. E.g. the JSP classloader might invoke the JSP compiler dynamically if the JSP file has a recent timestamp.
Running the JVM with the -verbose:class flag will enable a lot of logging which should help you if you're just using the standard bootstrap classloaders.
If there's some custom classloader, you could supply your own URLConnectionFactory and see what URLs are being fetched.
You have actually several questions here.
The classes in the classpath directories and jars will usually be loaded by one classloader (the application classloader), not by several ones for each entry.
If you have classloaders in a parent-child-relationship, the child one should first ask its parent to load the class and only lookup the bytecode itself when the parent did not find anything. (There are special-purpuse classloaders in some frameworks which do this the other way around. If each class exists only once, then this should not make a difference.)
If you have an URLClassLoader, then you can ask its getURLs() method to find out from where it loads. For other classloaders, there may or may not be a way to find this.
Take a look at the ClassLoader API and you will realise there is a method that passes a name and eventually the class loader passes a byte[] to define the class. Because it is a proper class it can grab those bytes from anywhere it wants to. ClassLoader is just another public class anyone can implement their own implementation and do their own thing. ClassLoaders are everywhere, we have the version that reads the classpath system property, in tomcat we have another that reads from a war file, in osgi it reads from a jar file. Each does a few extra things besides simply reading some file and tahts the beauty and flexibility of classloading.
There is no method on ClassLoader that returns a String, because what would it return given the above mentioned CLassLoaders ? A file path, a jar file path ? etc
In general no, but in practice you often want to find out where some class, resource is being loaded from and you can do,
System.out.println(someClassLoader.getResource("someResource.txt"));
Even more useful, if you are looking to find which .class file a Class is from, do
Class c = SomeClass.class;
System.out.println(c.getResource(c.getSimpleName() + ".class"));
The above is not guaranteed to work if the .class file is generated dynamically, but works in most situations.

How to remove a loaded class from classloader? [duplicate]

I have a custom class loader so that a desktop application can dynamically start loading classes from an AppServer I need to talk to. We did this since the amount of jars that are required to do this are ridiculous (if we wanted to ship them). We also have version problems if we don't load the classes dynamically at run time from the AppServer library.
Now, I just hit a problem where I need to talk to two different AppServers and found that depending on whose classes I load first I might break badly... Is there any way to force the unloading of the class without actually killing the JVM?
Hope this makes sense
The only way that a Class can be unloaded is if the Classloader used is garbage collected. This means, references to every single class and to the classloader itself need to go the way of the dodo.
One possible solution to your problem is to have a Classloader for every jar file, and a Classloader for each of the AppServers that delegates the actual loading of classes to specific Jar classloaders. That way, you can point to different versions of the jar file for every App server.
This is not trivial, though. The OSGi platform strives to do just this, as each bundle has a different classloader and dependencies are resolved by the platform. Maybe a good solution would be to take a look at it.
If you don't want to use OSGI, one possible implementation could be to use one instance of JarClassloader class for every JAR file.
And create a new, MultiClassloader class that extends Classloader. This class internally would have an array (or List) of JarClassloaders, and in the defineClass() method would iterate through all the internal classloaders until a definition can be found, or a NoClassDefFoundException is thrown. A couple of accessor methods can be provided to add new JarClassloaders to the class. There is several possible implementations on the net for a MultiClassLoader, so you might not even need to write your own.
If you instanciate a MultiClassloader for every connection to the server, in principle it is possible that every server uses a different version of the same class.
I've used the MultiClassloader idea in a project, where classes that contained user-defined scripts had to be loaded and unloaded from memory and it worked quite well.
Yes there are ways to load classes and to "unload" them later on. The trick is to implement your own classloader which resides between high level class loader (the System class loader) and the class loaders of the app server(s), and to hope that the app server's class loaders do delegate the classloading to the upper loaders.
A class is defined by its package, its name, and the class loader it originally loaded. Program a "proxy" classloader which is the first that is loaded when starting the JVM. Workflow:
The program starts and the real "main"-class is loaded by this proxy classloader.
Every class that then is normally loaded (i.e. not through another classloader implementation which could break the hierarchy) will be delegated to this class loader.
The proxy classloader delegates java.x and sun.x to the system classloader (these must not be loaded through any other classloader than the system classloader).
For every class that is replaceable, instantiate a classloader (which really loads the class and does not delegate it to the parent classloader) and load it through this.
Store the package/name of the classes as keys and the classloader as values in a data structure (i.e. Hashmap).
Every time the proxy classloader gets a request for a class that was loaded before, it returns the class from the class loader stored before.
It should be enough to locate the byte array of a class by your class loader (or to "delete" the key/value pair from your data structure) and reload the class in case you want to change it.
Done right there should not come a ClassCastException or LinkageError etc.
For more informations about class loader hierarchies (yes, that's exactly what you are implementing here ;- ) look at "Server-Based Java Programming" by Ted Neward - that book helped me implementing something very similar to what you want.
I wrote a custom classloader, from which it is possible to unload individual classes without GCing the classloader. Jar Class Loader
Classloaders can be a tricky problem. You can especially run into problems if you're using multiple classloaders and don't have their interactions clearly and rigorously defined. I think in order to actually be able to unload a class youlre going go have to remove all references to any classes(and their instances) you're trying to unload.
Most people needing to do this type of thing end up using OSGi. OSGi is really powerful and surprisingly lightweight and easy to use,
You can unload a ClassLoader but you cannot unload specific classes. More specifically you cannot unload classes created in a ClassLoader that's not under your control.
If possible, I suggest using your own ClassLoader so you can unload.
Classes have an implicit strong reference to their ClassLoader instance, and vice versa. They are garbage collected as with Java objects. Without hitting the tools interface or similar, you can't remove individual classes.
As ever you can get memory leaks. Any strong reference to one of your classes or class loader will leak the whole thing. This occurs with the Sun implementations of ThreadLocal, java.sql.DriverManager and java.beans, for instance.
If you're live watching if unloading class worked in JConsole or something, try also adding java.lang.System.gc() at the end of your class unloading logic. It explicitly triggers Garbage Collector.

Unloading classes in java?

I have a custom class loader so that a desktop application can dynamically start loading classes from an AppServer I need to talk to. We did this since the amount of jars that are required to do this are ridiculous (if we wanted to ship them). We also have version problems if we don't load the classes dynamically at run time from the AppServer library.
Now, I just hit a problem where I need to talk to two different AppServers and found that depending on whose classes I load first I might break badly... Is there any way to force the unloading of the class without actually killing the JVM?
Hope this makes sense
The only way that a Class can be unloaded is if the Classloader used is garbage collected. This means, references to every single class and to the classloader itself need to go the way of the dodo.
One possible solution to your problem is to have a Classloader for every jar file, and a Classloader for each of the AppServers that delegates the actual loading of classes to specific Jar classloaders. That way, you can point to different versions of the jar file for every App server.
This is not trivial, though. The OSGi platform strives to do just this, as each bundle has a different classloader and dependencies are resolved by the platform. Maybe a good solution would be to take a look at it.
If you don't want to use OSGI, one possible implementation could be to use one instance of JarClassloader class for every JAR file.
And create a new, MultiClassloader class that extends Classloader. This class internally would have an array (or List) of JarClassloaders, and in the defineClass() method would iterate through all the internal classloaders until a definition can be found, or a NoClassDefFoundException is thrown. A couple of accessor methods can be provided to add new JarClassloaders to the class. There is several possible implementations on the net for a MultiClassLoader, so you might not even need to write your own.
If you instanciate a MultiClassloader for every connection to the server, in principle it is possible that every server uses a different version of the same class.
I've used the MultiClassloader idea in a project, where classes that contained user-defined scripts had to be loaded and unloaded from memory and it worked quite well.
Yes there are ways to load classes and to "unload" them later on. The trick is to implement your own classloader which resides between high level class loader (the System class loader) and the class loaders of the app server(s), and to hope that the app server's class loaders do delegate the classloading to the upper loaders.
A class is defined by its package, its name, and the class loader it originally loaded. Program a "proxy" classloader which is the first that is loaded when starting the JVM. Workflow:
The program starts and the real "main"-class is loaded by this proxy classloader.
Every class that then is normally loaded (i.e. not through another classloader implementation which could break the hierarchy) will be delegated to this class loader.
The proxy classloader delegates java.x and sun.x to the system classloader (these must not be loaded through any other classloader than the system classloader).
For every class that is replaceable, instantiate a classloader (which really loads the class and does not delegate it to the parent classloader) and load it through this.
Store the package/name of the classes as keys and the classloader as values in a data structure (i.e. Hashmap).
Every time the proxy classloader gets a request for a class that was loaded before, it returns the class from the class loader stored before.
It should be enough to locate the byte array of a class by your class loader (or to "delete" the key/value pair from your data structure) and reload the class in case you want to change it.
Done right there should not come a ClassCastException or LinkageError etc.
For more informations about class loader hierarchies (yes, that's exactly what you are implementing here ;- ) look at "Server-Based Java Programming" by Ted Neward - that book helped me implementing something very similar to what you want.
I wrote a custom classloader, from which it is possible to unload individual classes without GCing the classloader. Jar Class Loader
Classloaders can be a tricky problem. You can especially run into problems if you're using multiple classloaders and don't have their interactions clearly and rigorously defined. I think in order to actually be able to unload a class youlre going go have to remove all references to any classes(and their instances) you're trying to unload.
Most people needing to do this type of thing end up using OSGi. OSGi is really powerful and surprisingly lightweight and easy to use,
You can unload a ClassLoader but you cannot unload specific classes. More specifically you cannot unload classes created in a ClassLoader that's not under your control.
If possible, I suggest using your own ClassLoader so you can unload.
Classes have an implicit strong reference to their ClassLoader instance, and vice versa. They are garbage collected as with Java objects. Without hitting the tools interface or similar, you can't remove individual classes.
As ever you can get memory leaks. Any strong reference to one of your classes or class loader will leak the whole thing. This occurs with the Sun implementations of ThreadLocal, java.sql.DriverManager and java.beans, for instance.
If you're live watching if unloading class worked in JConsole or something, try also adding java.lang.System.gc() at the end of your class unloading logic. It explicitly triggers Garbage Collector.

Categories

Resources