How to know which classes inside a .jar file are referenced? - java

I need to deploy only the referenced classes in a very limited environment as A data carousel for Interactive TV. Bandwidth is expensive and .jar files are not supported.

Check out ProGuard which is an obfuscator that will list code and classes that are not used. Obfuscating itself usually results in a smaller foot print.
ProGuard is a free Java class file shrinker, optimizer, obfuscator, and preverifier. It detects and removes unused classes, fields, methods, and attributes. It optimizes bytecode and removes unused instructions. It renames the remaining classes, fields, and methods using short meaningless names. Finally, it preverifies the processed code for Java 6 or for Java Micro Edition.

Sounds like you need a dependency analyzer. This one might do the trick.
ProGuard might be even better, since it can also shrink existing .class files.

Perhaps you could load a custom class loader which does support jar files or ideally pack200 files.

Related

Are there any Java Class Library "header files" containing all method descriptors in the standard library?

In order to create a valid .class file, every method has to have a full internal name and type descriptors associated with it. When procedurally creating these, is there some sort of lookup table one can use (outside of Java, where a ClassLoader can be used) to get these type descriptors from a method name? For example, how would one go from Scanner.hasNextByte to boolean java.util.Scanner.hasNextByte(int) / boolean java.util.Scanner.hasNextByte() (or even from java.util.Scanner.hasNextByte to boolean java.util.Scanner.hasNextByte(int) / boolean java.util.Scanner.hasNextByte())? The above example has overloading in it, which is another problem a human- but mostly computer-readable declarations file would hopefully address.
I've found many sources of human-readable documentation like https://docs.oracle.com/javase/8/docs/api/index.html containing uses of each method, hyperlinks to other places, etc. but never a simple text file or collection of files containing just declarations in any format. If there's no such file(s) don't worry about it, I can try and scrape some annoying HTML files, but if there is it would save a lot of time. Thanks!
The short answer is No.
There isn't a "header file" containing the class and method signatures for the Java class libraries. The Java tool chain has no need for such a thing. Nor do 3rd-party Java compilers, or compilers for other languages that rely on the Java SE class libraries.
AFAIK, there isn't a 3rd-party tool that builds such a file or an equivalent database or in-memory data structures.
You could create one though.
You could chose an existing Java parsing library, and use it to build parse trees for all of the source files in the class library, and emit the information that you need.
You could potentially create a custom Javadoc "doclet" plugin to emit the information.
Having said that, I don't understand why you would need such a mapping. Surely, your IDE does this already ... and exposes the information via some internal API. And if this is not for an IDE plugin, what it is for?
You commented:
I'm making a compiler for a JVM-based programming language ....
Ah ... so your compiler should do what other compilers do. Get the information from the ".class" file. You can either load the class using a standard or custom class loader, or you can use a library like asm or bcel or javassist ... which can read a ".class" file without loading it.
(I haven't checked, but I think the standard javac compiler uses an internal API to do this.)
Note that your proposed approaches won't work for interfacing with 3rd-party Java libraries where the source code is not available and/or the javadoc is not scrapable.
What about building it from the source files for the standard library?
The Oracle Java 8 API web pages you referenced was created by Javadoc processing of source files for the Java standard library.
If you use an IDE with a debugger, there is a good chance you already have much of the standard library source code downloaded. After all, if you set a break point, and then follow the program step-by-step with "Step into", you can trace the execution of the program into standard library methods. The source files would be part of the JDK.
However, some parts of the standard library source might not be available, due to licensing restrictions.

Is it able to get all Java classes is called in runtime?

After finishing my project, I want to remove all the unused classes to reduce the size of jar file when packaging.
I am using IntelliJ, it can help me detect unused classed but it includes some classes are only called by reflection (runtime only). Moreover, it cannot detect unused classes in external libraries.
One important thing, I want to remove unused classed in external libraries. Example, when I use BiMap from Google Guava, I have to include Guava lib, but I just want to use only BiMap, including whole Guava makes my jar getting big
So, I thinked reversely, instead of finding unused classes, I want to know all the classes is used/called when run (I will remove unused classed/packages manually). How can I do that?
Consider using a tool like Proguard (http://proguard.sourceforge.net/) to do this
I am unsure how you can limit the contents of the jar file to only the referenced Java classes. You may also run into issues when a class is loaded dynamically.
Guava explains on their site how you can include a subset of Guava in your build, by using ProGuard: https://github.com/google/guava/wiki/UsingProGuardWithGuava

Java Imports, Assembly (Krakatau), and Source Code

So here's my situation:
I am running a Java Client/Server architecture that has high CPU usage and I'm trying to reduce the lag time on the main "server" thread. I have been profiling the server with YourKit to identify the CPU-hogging code.
The problem is:
I am using someone else's code, and because of the way it is written, it is impossible to decompile, then recompile without using a special obfuscator which I do not have access to (no I am not violating any copyrights or anything).
What I am currently doing:
To modify the class files without worrying about obfuscation, I have been using Storyyeller's amazing Krakatau decompiler (https://github.com/Storyyeller/Krakatau) to disassemble class files into assembly files.
I manually edit the .j assembly files while looking at a Jasmin reference page (which takes FOREVER and I often mess up), then reassemble them into class files and run them again.
What I want to do:
Instead of painstakingly editing the assembly, I was wondering, does anyone know of a way to convert .java Source Code to .j Assembly code?
Also, if I simply decompile the .class files, is it possible for me to simply recompile them even though the packages for the imports do not exist?
import com.bazinga.*;
public class MainThread{}//compile this even though package com.bazinga doesn't exist?
If anyone knows ANY WAY I could do this, I would really appreciate it!
Instead of painstakingly editing the assembly, I was wondering, does anyone know of a way to convert .java Source Code to .j Assembly code?
Yes and no. The obvious answer is that you can just compile your code and then disassemble the resulting classes. However, this is not always helpful, because compilation can be context dependent (such as inlining static final constants, or handling of nested classes). Additionally, if you plan to add your code into an existing method, you have to be careful to not use existing local variable slots or clobber the operand stack.
My best advice is to try to isolate your modifications as much as possible. For example, if you want to add code to the jar, instead of inserting it into an existing class, just write the code you want to add in Java, compile it and add the classfiles in. Then modify the target class to call into your own class.
As for imports, you can compile against stubs. Just create a dummy class with the name you want, and optionally dummy methods for anything you need to call. The implementations can just be {throw null;} or similar, since you won't actually be executing them ever, they just need to exist to satisfy the compiler during compilation.

Hibernate: is it possible to reduce file size of jar?

I am working on a desktop application, I use Hibernate and HSQLDB. When I make my application a runnable jar file, it has a bigger fize size than I think. I see that the biggest part is from Hibernate and its dependencies. I am not sure if I need all of the Hibernate features. Is there a way to get rid of the parts of Hibernate and its dependency libraries which I don't use?
Under the /lib/ folder in Hibernate zip you will see a folder called /required/. For very basic Hibernate apps thats all you will need though you may need additional JARs for things such as JPA. I would start by only including the JARs in the lib/required/ directory, see if your project works, and if it doesn't add what you need to get your project working again.
perhaps you could use a tool to analyse your classes and dependencies (for e.g. http://www.dependency-analyzer.org/). Here is another post about it: How do I find out what jar files are actually used when compiling a java project.
the other way is to remove some jars (or even single class files) and try whether your application is still working or not. but i think this is not a very good way...
I can't think of a better tool for this than ProGuard.
ProGuard is a free Java class file shrinker, optimizer, obfuscator, and preverifier. It detects and removes unused classes, fields, methods, and attributes. It optimizes bytecode and removes unused instructions. It renames the remaining classes, fields, and methods using short meaningless names. Finally, it preverifies the processed code for Java 6 or for Java Micro Edition.

How do I strip the fluff out of a third party library?

It may not be best practice but are there ways of removing unsused classes from a third party's jar files. Something that looks at the way in which my classes are using the library and does some kind of coverage analysis, then spits out another jar with all of the untouched classes removed.
Obviously there are issues with this. Specifically, the usage scenario I put it though may not use all classes all the time.
But neglecting these problems, can it be done in principle?
There is a way.
The JarJar project does this AFAIR. The first goal of the JarJar project is to allow one to embed third party libraries in your own jar, changing the package structure if necessary. Doing so it can strip out the classes that are not needed.
Check it out at http://code.google.com/p/jarjar/.
Here is a link about shrinking jars: http://sixlegs.com/blog/java/jarjar-keep.html
There is a tool in Ant called a classfileset. You specify the list of root classes that you know you need, and then the classfileset recursively analyzes their code to find all dependencies.
Alternatively, you could develop a good test suite that exercises all of the functions that you need, then run your tests under a test coverage tool. The tool will tell you which classes (and statement in them) were actually utilized. This could give you an even smaller set of code than what you'd find with static analysis.
I use ProGuard for this. As well as being an excellent obfuscator, it has a code shrinking phase which can combine multiple JARs and then strip out any unused classes or class members. It does an excellent job at shrinking.
At a previous job, I used a Java obfuscator that as well as obfuscating the code, also removed classes and methods that weren't being used. If you were doing "Class.byName" or any other type of reflection stuff, you needed to tell the obfuscator because it couldn't tell by inspecting the code what classes or methods called by reflection.
The problem, of course, is that you don't know if other parts of the third party library are doing any reflection, and so removing an "unused" class might cause things to break in an obscure case that you haven't tested.
jar is just a zip file, so I guess you can. If you could get to the source, it's cleaner. Maybe try disassembling the class?
Adding to this question, can that improve performance? Since the classes not used would not be JIT compiled improving startup time or does the java automatically detect that while compiling to bytecode and do not even deal with the code that is not used?
This would be an interesting project (has anyone done it already?)
I presume you'd give the tool your jar(s) as a starting point, and the library jar to clean up. It could use reflection to determine which classes your jar(s) reference directly, and which are used indirectly down the call tree (this is not trivial at all, but doable). If it encounters any reflection code in any of the two places, it should give a very loud warning.

Categories

Resources