How do I strip the fluff out of a third party library? - java

It may not be best practice but are there ways of removing unsused classes from a third party's jar files. Something that looks at the way in which my classes are using the library and does some kind of coverage analysis, then spits out another jar with all of the untouched classes removed.
Obviously there are issues with this. Specifically, the usage scenario I put it though may not use all classes all the time.
But neglecting these problems, can it be done in principle?

There is a way.
The JarJar project does this AFAIR. The first goal of the JarJar project is to allow one to embed third party libraries in your own jar, changing the package structure if necessary. Doing so it can strip out the classes that are not needed.
Check it out at http://code.google.com/p/jarjar/.
Here is a link about shrinking jars: http://sixlegs.com/blog/java/jarjar-keep.html

There is a tool in Ant called a classfileset. You specify the list of root classes that you know you need, and then the classfileset recursively analyzes their code to find all dependencies.
Alternatively, you could develop a good test suite that exercises all of the functions that you need, then run your tests under a test coverage tool. The tool will tell you which classes (and statement in them) were actually utilized. This could give you an even smaller set of code than what you'd find with static analysis.

I use ProGuard for this. As well as being an excellent obfuscator, it has a code shrinking phase which can combine multiple JARs and then strip out any unused classes or class members. It does an excellent job at shrinking.

At a previous job, I used a Java obfuscator that as well as obfuscating the code, also removed classes and methods that weren't being used. If you were doing "Class.byName" or any other type of reflection stuff, you needed to tell the obfuscator because it couldn't tell by inspecting the code what classes or methods called by reflection.
The problem, of course, is that you don't know if other parts of the third party library are doing any reflection, and so removing an "unused" class might cause things to break in an obscure case that you haven't tested.

jar is just a zip file, so I guess you can. If you could get to the source, it's cleaner. Maybe try disassembling the class?

Adding to this question, can that improve performance? Since the classes not used would not be JIT compiled improving startup time or does the java automatically detect that while compiling to bytecode and do not even deal with the code that is not used?

This would be an interesting project (has anyone done it already?)
I presume you'd give the tool your jar(s) as a starting point, and the library jar to clean up. It could use reflection to determine which classes your jar(s) reference directly, and which are used indirectly down the call tree (this is not trivial at all, but doable). If it encounters any reflection code in any of the two places, it should give a very loud warning.

Related

Find out which Java classes are actually loaded and reduce jar

Is there a way to automatically find out which Java classes are actually loaded (either during compile time, as far as that's possible, or during the runtime of an application), and to throw out all other classes from a JAR to create a smaller JAR? Does that actually make sense in practice?
I am talking about the application classes for an application JAR. Usually there are lots of libraries in an application, and an application rarely needs all features of those libraries. So I suspect that would make a considerably smaller application. In theory that might be done for example via an Java agent that logs which classes and resources are read by one or several runs of an application (or even just by java -verbose:class), and a maven plugin that throws out all other classes from a jar-with-dependencies. Is there already something like that?
Clarification: I am not talking about unused dependencies (JARs that are not used at all), but about removing unused parts of each included JAR.
Well, the Maven Shade Plugin has an option minimizeJar when creating an Uber-JAR for your application:
https://maven.apache.org/plugins/maven-shade-plugin/
But, as others already pointed out, this is quite dangerous, as it regularly fails to detect class accesses which are done via Reflection or other dynamic references.
It may not be a good approach automate, as application can use reflection to initialise objects or one JAR is dependent on another JAR.
Only way that I can think of is to remove each JARs one by one and check if application runs as expected. Then again in this approach all modules of the application has to be tested, since one module can work without particular dependency and other may not.
Better solution is to take care while developing. The application developer must be careful in adding a dependency and removing unwanted dependency after his/her piece of code is done.
Global strategy.
1) Find all the classes that are loaded during runtime.
2) List of all the classes available in the classpath.
3) Reduce your class path by creating copies of jars containing only classes you need.
I have done 1 and 2 part so I can help you.
1) Find out all the classes that are loaded. You need 100 % code coverage (I am not talking about tests, but production). So run all possible scenarios, so all the classes your app needs will be loaded and logged.
To log loaded classes try several approaches. Reflection, –verbose:class flag, also you can learn about java agent. It allows to modify methods during runtime. This is an example of some java agent code or another java agent example
2) To find all the classes available in jar, you can write a program. You need to know all places where application jars are placed. Loop throw these jars (You can use ZipFile), loop through ZipFileEntry entries, and collect all classes.
3) After that write a script or program that reassembles your application. For example, now you can create a new jar file for each library and put there only needed classes.
Also you may use a tool (again, you are a programmer, so write a program), which checks code for classes dependence. You do not want to remove classes if they are used for compilation. When I was a student, I wrote code alanyzer, which builds an oriented graph for classes dependencies.
As #Gokul Nath KP notes, I did this before. I manually change gradle and maven dependencies, removing one by one, and then full regression test. It took me a week (our application was small comparing to modern world enterprise systems created by hundreds of developers).
So, be creative, and in case of success, your project will be used by millions!

Java Imports, Assembly (Krakatau), and Source Code

So here's my situation:
I am running a Java Client/Server architecture that has high CPU usage and I'm trying to reduce the lag time on the main "server" thread. I have been profiling the server with YourKit to identify the CPU-hogging code.
The problem is:
I am using someone else's code, and because of the way it is written, it is impossible to decompile, then recompile without using a special obfuscator which I do not have access to (no I am not violating any copyrights or anything).
What I am currently doing:
To modify the class files without worrying about obfuscation, I have been using Storyyeller's amazing Krakatau decompiler (https://github.com/Storyyeller/Krakatau) to disassemble class files into assembly files.
I manually edit the .j assembly files while looking at a Jasmin reference page (which takes FOREVER and I often mess up), then reassemble them into class files and run them again.
What I want to do:
Instead of painstakingly editing the assembly, I was wondering, does anyone know of a way to convert .java Source Code to .j Assembly code?
Also, if I simply decompile the .class files, is it possible for me to simply recompile them even though the packages for the imports do not exist?
import com.bazinga.*;
public class MainThread{}//compile this even though package com.bazinga doesn't exist?
If anyone knows ANY WAY I could do this, I would really appreciate it!
Instead of painstakingly editing the assembly, I was wondering, does anyone know of a way to convert .java Source Code to .j Assembly code?
Yes and no. The obvious answer is that you can just compile your code and then disassemble the resulting classes. However, this is not always helpful, because compilation can be context dependent (such as inlining static final constants, or handling of nested classes). Additionally, if you plan to add your code into an existing method, you have to be careful to not use existing local variable slots or clobber the operand stack.
My best advice is to try to isolate your modifications as much as possible. For example, if you want to add code to the jar, instead of inserting it into an existing class, just write the code you want to add in Java, compile it and add the classfiles in. Then modify the target class to call into your own class.
As for imports, you can compile against stubs. Just create a dummy class with the name you want, and optionally dummy methods for anything you need to call. The implementations can just be {throw null;} or similar, since you won't actually be executing them ever, they just need to exist to satisfy the compiler during compilation.

Java - make a library and import optional

I have a library that I'm using in an Java application - it's important for certain functionality, but it's optional. Meaning that if the JAR file is not there, the program continues on without issue. I'd like to open source my program, but I can not include this library, which is necessary to compile the source code as I have numerous import statements to use the API. I don't want to maintain two code sets. What is the best way to remove the physical jar file from open source release, but still maintain the code to support it where other people could still compile it?
the typical approach taken is to define the wrapper API (i.e. interfaces) and include those interfaces in the open sourced code, and then provide configuration options where one can specify class names of classes that implement certain interfaces.
You will import API interfaces instead of importing classes directly into your open sourced code. This way, you are open sourcing the API but not the implementation of the parts that you do not want to open source or you cannot open source.
There are many examples, but take a look at JDBC API (interfaces) and JDBC drivers (implementation classes) for starters.
I was pretty much typing the same thing as smallworld with one addition. If this API were necessary you can use a project build tool like Maven to handle the dependencies on you project. If someone checks it out from source control with the pom they can download the dependencies for themselves and you don't have to include them in a source repo.
There's probably a number of ways to fix this, here's a couple I can think of:
If you have only a couple of methods you need to invoke in the 3rd party library, you could use reflection to invoke those methods. It creates really verbose code, that is hard to read though.
If you don't have too much of the API in the 3rd party library you use, you could also create a separate JAR file, containing just a non-functional shell of the classes in the library (just types with the same names and methods with the same signatures). You can then use this JAR to distribute and compile against. At run-time you'd replace it with the real JAR if available.
The most common way is probably to just create a wrapper API in a separate module/project for the code that is dependent on the 3rd party library, and possibly distribute a pre-built JAR. This might go against your wish to not maintain two code sets, but may prove to be the best and less painful solution in the long run.

Java: Locate reflection code usage

We have huge codebase and some classes are often used via reflection all over the code. We can safely remove classes and compiler is happy, but some of them are used dynamically using reflection so I can't locate them otherwise than searching strings ...
Is there some reflection explorer for Java code?
No simple tool to do this. However you can use code coverage instead. What this does is give you a report of all the line of code executed. This can be even more useful in either improving test code or removing dead code.
Reflections is by definition very dynamic and you have to run the right code to see what it would do. i.e. you have to have reasonable tests. You can add logging to everything Reflection does if you can access this code, or perhaps you can use instrumentation of these libraries (or change them directly)
I suggest, using appropriately licensed source for your JRE, modifying the reflection classes to log when classes are used by reflection (use a map/WeakHashMap to ignore duplicates). Your modified system classes can replace those in rt.jar with -Xbootclasspath/p: on the command line (on Oracle "Sun" JRE, others will presumably have something similar). Run your program and tests and see what comes up.
(Possibly you might have to hack around issues with class loading order in the system classes.)
I doubt any such utility is readily available, but I could be wrong.
This is quite complex, considering that dynamically loaded classes (via reflection) can themselves load other classes dynamically and that the names of loaded classes may come from variables or some runtime input.
Your codebase probably does neither of these. If this a one time effort searching strings might be a good option. Or you look for calls to reflection methods.
As the other posters have mentioned, this cannot be done with static analysis due to the dynamic nature of Reflection. If you are using Eclipse, you might find this coverage tool to be useful, and it's very easy to work with. It's called EclEmma

Compile Java class with missing code parts

I'm looking for some ideas on how to compile Java code with some other pieces of code missing (method calls). I am fully aware that javac will not allow you to compile Java files if cannot find all dependencies. But maybe there is some way how to bypass it, something like force compile.
My bytecode knowledge is not so good but I think some method invoke is just full package definition of class and method name with parameters. So if compiler just puts this data to class file and assume in running process dependency will be available (if not simple NoSuchMethodExp).
Only workaround so far I found is to create empty missing class files with empty methods to "cheat" compiler. Works perfectly but there should be easier way :)
Any ideas?
Use Interfaces.
Create the interfaces that have the methods you need. At runtime, inject (Spring, Guice, etc.) or generate (cglib ...) classes that implement the interface.
If you're modifying a jar, you can extract the class files you are not modifying to another directory and include that in the classpath. That way they will be available to the compiler.
Bad luck! Probably all you can do is to create mock objects for missing parts of code just to compile your code (empty methods, so the compiler can find it).
Another question - if you miss some classes, how will you execute that code?
UPDATED according to information provided:
Well, there is another option to modify classes in jar, you can use AOP, and to make it done read about AspectJ - actually for me this is the easiest option (typically you need to spend time mocking objects, writing empty methods, so I would contribute that time to study new technology, which will help you many times ;)
And btw the easiest way to implement it, if you use Eclipse, is:
install AJDT
create aspect project
create aspect which modifies code (depending on what you need to change)
add jar file you want to modify
immediately get modified code in
another already packed jar file
Sounds magically :)
In this case you don't need any dependencies in classpath, except for libraries which are needed for new code you add!
Methods aren't dependencies. They are part of the class definition. The only places the java runtime looks for method definitions is in the class def that was compiled at compile time and in its parent classes. If you're problem is that a super class is incomplete, I don't think I can help you.
If not, you could define some of these methods as abstract and than have a child class implement them.
What kind of code is missing? Normally this happens if you refer to libraries your compiler can't find. Maybe you simply need to extend the classpath the compiler is searching for classes.
If you really refer to code that is not available yet you need to implement at least those methods you refer to. But that sounds strange... maybe you can clear things up.

Categories

Resources