Java Classloader complexity - java

In trying to learn about Java class loaders from Wikipedia, I think I can see why they have the three major class loaders:
1) Bootstrap class loader
2) Extensions class loader
3) System class loader
They go on to say you can define your own classloader. I'm not sure I see the value in defining your own, but the following quote from Wikipedia really makes me wonder:
The most complex JAR hell problems arise in circumstances that take
advantage of the full complexity of the classloading system. A Java
program is not required to use only a single "flat" classloader, but
instead may be composed of several (potentially very many) nested,
cooperating classloaders. Classes loaded by different classloaders may
interact in complex ways not fully comprehended by a developer,
leading to errors or bugs that are difficult to analyze, explain, and
resolve.
If it's so complex, why bother with it? Shouldn't the three already-defined classloaders be enough?
(And yes, for those curious, I did run into a ClassCastException that I didn't think should have happened, much like the graphic labelled Figure 2. Class identity crisis. I'm trying to understand the background is all.)

Certain use cases require custom classloaders.
A few examples:
Dynamically adding new folders/jars to be loadable. (Without restarting the whole application).
Dynamically removing folder/jars from being loadable.
Runtime bytecode generation with javassist.
Multiple (actually used at the same time) versions of the same classes in the same application/jvm

Related

Using different versions of dependencies in separated Java platform modules

I expected it's possible to use i.e. Guava-19 in myModuleA and guava-20 in myModuleB, since jigsaw modules have their own classpath.
Let's say myModuleA uses Iterators.emptyIterator(); - which is removed in guava-20 and myModuleB uses the new static method FluentIterable.of(); - which wasn't available in guava-19. Unfortunately, my test is negative. At compile-time, it looks fine. In contrast to runtime the result is a NoSuchMethodError. Means that, the class which was the first on the classloader decides which one fails.
The encapsulation with the underlying coupling? I found a reason for myself. It couldn't be supported because of transitive dependencies would have the same problem as before. If a guava class which has version conflicts occurred in the signature in ModuleA and ModuleB depends on it. Which class should be used?
But why all over the internet we can read "jigsaw - the module system stops the classpath hell"? We have now multiple smaller "similar-to-classpaths" with the same problems. It's more an uncertainty than a question.
Version Conflicts
First a correction: You say that modules have their own class path, which is not correct. The application's class path remains as it is. Parallel to it the module path was introduced but it essentially works in the same way. Particularly, all application classes are loaded by the same class loader (by default at least).
That there is only a single class loader for all application classes also explains why there can't be two versions of the same class: The entire class loading infrastructure is built on the assumption that a fully qualified class name suffices to identify a class with a class loader.
This also opens the path to the solution for multiple versions. Like before you can achieve that by using different class loaders. The module system native way to do that would be to create additional layers (each layer has its own loader).
Module Hell?
So does the module system replace class path hell with module hell? Well, multiple versions of the same library are still not possible without creating new class loaders, so this fundamental problem remains.
On the other hand, now you at least get an error at compile or launch due to split packages. This prevents the program from subtly misbehaving, which is not that bad, either.
Theoretically it is possible to use different versions of the same library within your application. The concept that enables this: layering!
When you study Jigsaw under the hood you find a whole section dedicated to this topic.
The idea is basically that you can further group modules using these layers. Layers are constructed at runtime; and they have their own classloader. Meaning: it should be absolutely possible to use modules in different versions within one application - they just need to go into different layers. And as shown - this kind of "multiple version support" is actively discussed by the people working on java/jigsaw. It is not an obscure feature - it is meant to support different module versions under one hood.
The only disclaimer at this point: unfortunately there are no "complete" source code examples out there (of which I know), thus I can only link to that Oracle presentation.
In other words: there is some sort of solution to this versioning problem on the horizon - but it will take more time until to make experiences in real world code with this new idea. And to be precise: you can have different layers that are isolated by different class loaders. There is no support that would allow you that "the same object" uses modV1 and modV2 at the same time. You can only have two objects, one using modV1 and the other modV2.
( German readers might want to have a look here - that publication contain another introduction to the topic of layers ).
Java 9 doesn't solve such problems. In a nutshell what was done in java 9 is to extend classic access modifiers (public, protected, package-private, private) to the jar levels.
Prior to java 9, if a module A depends on module B, then all public classes from B will be visible for A.
With Java 9, visibility could be configured, so it could be limited only to a subset of classes, each module could define which packages exports and which packages requires.
Most of those checks are done by the compiler.
From a run time perspective(classloader architecture), there is no big change, all application modules are loaded by the same classloader, so it's not possible to have the same class with different versions in the same jvm unless you use a modular framework like OSGI or manipulate classloaders by yourself.
As others have hinted, JPMS layers can help with that. You can use them just manually, but Layrry might be helpful to you, which is a fluent API and configuration-based launcher for running layered applications. It allows you to define the layer structure by means of configuration and it will fire up the layer graph for you. It also supports the dynamic addition/removal of layers at runtime.
Disclaimer: I'm the initial creator of Layrry

How to use the Java Instrumentation API to reload classes when they change on the the file system?

I don't want to use the URL Classloader to load classes.
I want to implement this myself.
I don't want to use a solution like JRebel (although it's great).
I've got prior experience of JavaAssist, bytecode generation, implementing javaagent class transformers etc.
I would like to write a javaagent which hooks into the classloader or defines it's own system classloader.
I'll store the class files in an in memory cache, and for particular files, periodically reload them from disk.
I'd prefer to do this in a way which doesn't involve continuously polling the file system and manually invalidating specific classes. I'd much rather intercept class loading events.
I last messed around with this stuff 4 years ago, and I'm sure, although my memory may deceive me that it was possible to do, but 8 hours of searching google doesn't present an obvious solution beyond building a patched JVM.
Is this actually possible?
I've created a stub implementation at https://github.com/packetops/poc_agent if anyone's interested in a simple example of javaagent use.
update
Just found this post - I may have been using the wrong approach, I'll investigate further.
It depends on what you want to do. If you want to reload your classes and define new ones, then you are fine with implementing your own classloader, as you already found.
If you want to replace existing classes, things become more "envolved". You can do this by implementing your own tiny Java agent. See the Java documentation, how to do this: http://docs.oracle.com/javase/7/docs/api/java/lang/instrument/package-summary.html
With the instrumentation mechanism you can not freely redefine classes, quote from Instrumentation.redefineClass:
The redefinition may change method bodies, the constant pool and attributes. The redefinition must not add, remove or rename fields or methods, change the signatures of methods, or change inheritance. These restrictions maybe be lifted in future versions. The class file bytes are not checked, verified and installed until after the transformations have been applied, if the resultant bytes are in error this method will throw an exception.
If you want to do more, you need to load it again. This can be done under the same name, by using a different classloader. The previous class definition will be unloaded, if no one else is using it any more. So, you need to reload any class that uses your previous class also. Utlimatly, you end up reinventing something like OSGi. Take a look at: Unloading classes in java?

Replacing java class?

I'm working on a sandbox feature for my java antivirus, and I've come into a question: Does the specified package on a class matter for compilation?
Example:
I'm running a program that wants to use Runtime.getRuntime().exec(), when the classloader attempts to load that to run a method, does it check the package qualified in the file, if they exist? I would prefer not to try and change files in the JVM, but to simply load ones from a different package. I can accomplish the loading and such, but my only dilemma, will it crash and burn? Inside the java, it would be registered as say, java.lang.Runtime, but the compiled code will say for example pkg.pkg.Runtime and will it need to extend the old runtime? My guess is that extending the old runtime would just break it. Does anyone know anything about this? I'm working on making a testable example, but I'm still a bit away and wanted to get some answers, as well as this might benefit some people.
Does the specified package on a class matter for compilation?
Yes it does matter. A class called pkg.pkg.Runtime() cannot be loaded as if it was java.lang.Runtime.
Furthermore, if my memory is correct, the JVM has some additional security measures in it to prevent normal applications from injecting classes into core packages such as java.lang.
If you need to change the behaviour of the java.lang.Runtime class (for experimental purposes!) then I think you will need to put your modified version on the boot classpath, ahead of the "rt.jar" file.
However:
This level of tinkering can easily result in JVM instability; i.e. hard JVM crashes that are difficult to diagnose.
If your aim is to produce a "production quality" tool, then you will find that things that involve tinkering with the JVM are not considered acceptable. People are going to be very suspicious of installation instructions that say things like "add this to your installed JVM's bootclasspath".
Distributing a "tinkered with" JVM may fall foul of Oracle's Java licensing agreement.
My advice would be to look for a less intrusive way of doing what you are trying to do. For instance, if you are trying to do virus checking, either do it outside of the JVM, or in a custom application classloader.
You commented:
I have a custom classloader, my question is: If I compile a class that is labelled as say, pkg.pkg.Runtime, can I register in my classloader as java.lang.Runtime?
As I said above, no you can't. A bytecode file has the classname embedded in it. If you attempt to "pull a swifty" by loading a class with a different name, the JVM will throw an Error.
And:
If not, then how can I replace the class? If the compiled package name has to equal the request referenced naming, then can I modify the .class file to to match, or perhaps compile it as if it were in the java.lang package?
That's what you would have to do. You need to name the class java.lang.Runtime in the source code and compile it as such.
But what I meant by my advice above is that you should use do the virus checking in the class loader. Forget about trying to replace / modify the behaviour of Runtime. It is a bad idea for the reasons I listed above.

Class as Hashtable key -- is it a good idea?

After some consideration, i implemented caching in my application, basically, using a hashtable that contains Class as a key (which is the class that corresponds to a particular cached entity and inherits from an abstract AbstractCache) and the concrete cache object that is made from that class, which seems quite convenient.
But is it a good idea to have Class as a Hashtable key, or should i probably use just the .canonicalName() or something like that instead?
If you never need to unload classes, it's probably not an issue. But since requirements may vary based on how your code is run (stand-alone Java app or deployed in web container, enterprise server etc.) it might be wiser not to do this. Classloader leaks are very nasty bugs to track down and solve.
Using the canonical name would seem to be a much better solution. Do keep in mind the other challenge, though: if you have multiple classloaders that might load the same class, those classes will still be considered incompatible. You wouldn't be able to differentiate them by their canonical name, while two identical classes loaded by different classloaders would yield two distinct Class instances and could be used as keys in the same map.
If you don't have very specific classloader constraints or requirements, go with the canonical name. If you're deploying as anything else than a stand-alone Java application, be vary wary of the implications.
Sounds fine to me. I have done this in the past. Classes are effectively immutable singletons so there's no chance of using the "wrong" instance of a Class.
The only time there would be an issue is if there were multiple ClassLoaders involved but this is rare in most apps.

Loading Java Classes which arent needed

I'm currently wondering what the actual overhead, in the JVM, is for loading extra classes which are never used.
We have code which iterates all the classes in the class path to find classes which implement a certain interface, we then load them.
This allows custom classes to be simply dropped in a directory and they get loaded and registered.
The side affect is that we hit every class in the class path, causing the classes to load. What would be the affect on the JVMs memory?
Does simply loading classes affect the memory much at all?
As usual, I would advise measuring this for your particular scenario.
Having said that, I'm not sure I'd advise scanning the whole classpath. If you don't control the classpath (it's your customer's, or similar), potentially they could add anything to it, and your process is going to scan anything they drop into their classpath (possibly unrelated to your app).
I'd suggest that you nominate only certain directories/repositories that classes can be uploaded to, and that way you'll restrict the classpath scanning and reduce the chances of inadvertently picking up stuff you don't intend to.
If you use a separate ClassLoader to load those classes and are very careful not to create any references to those classes or instances of them, then when the ClassLoader becomes eligible for garbage collection, so do the classes.
Thus, you could avoid unnecessarily clogging your PermGen space by doing 2 passes with separate ClassLoaders: one to load all the classes and identify those you want to keep, and another to actually use them.
Won't using ClassLoaders in this way have unintended side-effects? Like running static initialisers and so on.
You could use the ServiceLoader mechanism, but if that doesn't suit, you can inspect classes without using ClassLoaders - byte manipulation libraries like BCEL and ASM can be used to just inspect classes.
Yes, this forces the VM to load the class file and examine it (which can be a performance problem). Moreover, if you're using a Sun VM, then these classes will stay in memory forever. the Sun VM puts classes in the so called "PermGen" space which is never garbage collected unless you specify a special option.
So this is generally a bad idea but there are two simple workarounds:
Check the name of the class (the filename). Repeat the name of the interface in the name, so you can easily notice what you have to load and what not.
Use two directories. One contains normal classes, the other the one all those which you want to always load.
A "far out there" suggestion:
could you do the same thing but without actually using the VM? the class file spec is documented, couldn't you write your own app to just read the class files and figure out if they implement your interface/whatever without actually loading them?
That would get you the ability to scan any directories but without any worry of loading classes or static intializers or anything like that.

Categories

Resources