I am using soot to instrument classes of an application. But I've found to way to instrument classes dynamically with it. Soot only detect static links which would cause failures with programs with dynamic loading. So I have to detect what classes are dynamically loaded in a program. Suppose I don't have the option to instrument all classes for practical reasons. For example, I have to instrument the whole JDK that could take hours. Because there is the possibility that a JDK class is loaded at run time.
My ultimate goal from this tool/method is to give me the complete name of all classes that a program uses.
People usually use TamiFlex in combination with Soot for such issues:
https://code.google.com/p/tamiflex/
TamiFlex lets you record dynamic loading with very little overhead.
Related
Is there a way to automatically find out which Java classes are actually loaded (either during compile time, as far as that's possible, or during the runtime of an application), and to throw out all other classes from a JAR to create a smaller JAR? Does that actually make sense in practice?
I am talking about the application classes for an application JAR. Usually there are lots of libraries in an application, and an application rarely needs all features of those libraries. So I suspect that would make a considerably smaller application. In theory that might be done for example via an Java agent that logs which classes and resources are read by one or several runs of an application (or even just by java -verbose:class), and a maven plugin that throws out all other classes from a jar-with-dependencies. Is there already something like that?
Clarification: I am not talking about unused dependencies (JARs that are not used at all), but about removing unused parts of each included JAR.
Well, the Maven Shade Plugin has an option minimizeJar when creating an Uber-JAR for your application:
https://maven.apache.org/plugins/maven-shade-plugin/
But, as others already pointed out, this is quite dangerous, as it regularly fails to detect class accesses which are done via Reflection or other dynamic references.
It may not be a good approach automate, as application can use reflection to initialise objects or one JAR is dependent on another JAR.
Only way that I can think of is to remove each JARs one by one and check if application runs as expected. Then again in this approach all modules of the application has to be tested, since one module can work without particular dependency and other may not.
Better solution is to take care while developing. The application developer must be careful in adding a dependency and removing unwanted dependency after his/her piece of code is done.
Global strategy.
1) Find all the classes that are loaded during runtime.
2) List of all the classes available in the classpath.
3) Reduce your class path by creating copies of jars containing only classes you need.
I have done 1 and 2 part so I can help you.
1) Find out all the classes that are loaded. You need 100 % code coverage (I am not talking about tests, but production). So run all possible scenarios, so all the classes your app needs will be loaded and logged.
To log loaded classes try several approaches. Reflection, –verbose:class flag, also you can learn about java agent. It allows to modify methods during runtime. This is an example of some java agent code or another java agent example
2) To find all the classes available in jar, you can write a program. You need to know all places where application jars are placed. Loop throw these jars (You can use ZipFile), loop through ZipFileEntry entries, and collect all classes.
3) After that write a script or program that reassembles your application. For example, now you can create a new jar file for each library and put there only needed classes.
Also you may use a tool (again, you are a programmer, so write a program), which checks code for classes dependence. You do not want to remove classes if they are used for compilation. When I was a student, I wrote code alanyzer, which builds an oriented graph for classes dependencies.
As #Gokul Nath KP notes, I did this before. I manually change gradle and maven dependencies, removing one by one, and then full regression test. It took me a week (our application was small comparing to modern world enterprise systems created by hundreds of developers).
So, be creative, and in case of success, your project will be used by millions!
In trying to learn about Java class loaders from Wikipedia, I think I can see why they have the three major class loaders:
1) Bootstrap class loader
2) Extensions class loader
3) System class loader
They go on to say you can define your own classloader. I'm not sure I see the value in defining your own, but the following quote from Wikipedia really makes me wonder:
The most complex JAR hell problems arise in circumstances that take
advantage of the full complexity of the classloading system. A Java
program is not required to use only a single "flat" classloader, but
instead may be composed of several (potentially very many) nested,
cooperating classloaders. Classes loaded by different classloaders may
interact in complex ways not fully comprehended by a developer,
leading to errors or bugs that are difficult to analyze, explain, and
resolve.
If it's so complex, why bother with it? Shouldn't the three already-defined classloaders be enough?
(And yes, for those curious, I did run into a ClassCastException that I didn't think should have happened, much like the graphic labelled Figure 2. Class identity crisis. I'm trying to understand the background is all.)
Certain use cases require custom classloaders.
A few examples:
Dynamically adding new folders/jars to be loadable. (Without restarting the whole application).
Dynamically removing folder/jars from being loadable.
Runtime bytecode generation with javassist.
Multiple (actually used at the same time) versions of the same classes in the same application/jvm
I don't want to use the URL Classloader to load classes.
I want to implement this myself.
I don't want to use a solution like JRebel (although it's great).
I've got prior experience of JavaAssist, bytecode generation, implementing javaagent class transformers etc.
I would like to write a javaagent which hooks into the classloader or defines it's own system classloader.
I'll store the class files in an in memory cache, and for particular files, periodically reload them from disk.
I'd prefer to do this in a way which doesn't involve continuously polling the file system and manually invalidating specific classes. I'd much rather intercept class loading events.
I last messed around with this stuff 4 years ago, and I'm sure, although my memory may deceive me that it was possible to do, but 8 hours of searching google doesn't present an obvious solution beyond building a patched JVM.
Is this actually possible?
I've created a stub implementation at https://github.com/packetops/poc_agent if anyone's interested in a simple example of javaagent use.
update
Just found this post - I may have been using the wrong approach, I'll investigate further.
It depends on what you want to do. If you want to reload your classes and define new ones, then you are fine with implementing your own classloader, as you already found.
If you want to replace existing classes, things become more "envolved". You can do this by implementing your own tiny Java agent. See the Java documentation, how to do this: http://docs.oracle.com/javase/7/docs/api/java/lang/instrument/package-summary.html
With the instrumentation mechanism you can not freely redefine classes, quote from Instrumentation.redefineClass:
The redefinition may change method bodies, the constant pool and attributes. The redefinition must not add, remove or rename fields or methods, change the signatures of methods, or change inheritance. These restrictions maybe be lifted in future versions. The class file bytes are not checked, verified and installed until after the transformations have been applied, if the resultant bytes are in error this method will throw an exception.
If you want to do more, you need to load it again. This can be done under the same name, by using a different classloader. The previous class definition will be unloaded, if no one else is using it any more. So, you need to reload any class that uses your previous class also. Utlimatly, you end up reinventing something like OSGi. Take a look at: Unloading classes in java?
I'm working on a sandbox feature for my java antivirus, and I've come into a question: Does the specified package on a class matter for compilation?
Example:
I'm running a program that wants to use Runtime.getRuntime().exec(), when the classloader attempts to load that to run a method, does it check the package qualified in the file, if they exist? I would prefer not to try and change files in the JVM, but to simply load ones from a different package. I can accomplish the loading and such, but my only dilemma, will it crash and burn? Inside the java, it would be registered as say, java.lang.Runtime, but the compiled code will say for example pkg.pkg.Runtime and will it need to extend the old runtime? My guess is that extending the old runtime would just break it. Does anyone know anything about this? I'm working on making a testable example, but I'm still a bit away and wanted to get some answers, as well as this might benefit some people.
Does the specified package on a class matter for compilation?
Yes it does matter. A class called pkg.pkg.Runtime() cannot be loaded as if it was java.lang.Runtime.
Furthermore, if my memory is correct, the JVM has some additional security measures in it to prevent normal applications from injecting classes into core packages such as java.lang.
If you need to change the behaviour of the java.lang.Runtime class (for experimental purposes!) then I think you will need to put your modified version on the boot classpath, ahead of the "rt.jar" file.
However:
This level of tinkering can easily result in JVM instability; i.e. hard JVM crashes that are difficult to diagnose.
If your aim is to produce a "production quality" tool, then you will find that things that involve tinkering with the JVM are not considered acceptable. People are going to be very suspicious of installation instructions that say things like "add this to your installed JVM's bootclasspath".
Distributing a "tinkered with" JVM may fall foul of Oracle's Java licensing agreement.
My advice would be to look for a less intrusive way of doing what you are trying to do. For instance, if you are trying to do virus checking, either do it outside of the JVM, or in a custom application classloader.
You commented:
I have a custom classloader, my question is: If I compile a class that is labelled as say, pkg.pkg.Runtime, can I register in my classloader as java.lang.Runtime?
As I said above, no you can't. A bytecode file has the classname embedded in it. If you attempt to "pull a swifty" by loading a class with a different name, the JVM will throw an Error.
And:
If not, then how can I replace the class? If the compiled package name has to equal the request referenced naming, then can I modify the .class file to to match, or perhaps compile it as if it were in the java.lang package?
That's what you would have to do. You need to name the class java.lang.Runtime in the source code and compile it as such.
But what I meant by my advice above is that you should use do the virus checking in the class loader. Forget about trying to replace / modify the behaviour of Runtime. It is a bad idea for the reasons I listed above.
How would you determine the classes (non Sun JDK classes) loaded / unused by a Java application?
Background:
I have an legacy Java webstart application that has gone through a lot of code changes and now has a lot of classes, most of which are not used. I would like to reduce the download size of the application by only deploying classes that will be used only instead of jaring the all the packages.
I will also use the same process to completely delete these unused classes.
Use java -verbose:class to see what classes are loaded, then use grep (or any other tool) to keep only the lines from your packages.
A small limitation: it will only tell you which classes are really used when they are used, so you must cover all use cases of your application.
You can use a good IDE for that.
For instance Intellij IDEA which analyzes the source code for dependencies and allows you to safely delete a class/method/attribute is is not being used by any other.
That way you can get rid off all your dead code.