I've got a large Java library and I want to develop several smaller applications that interface this library. The library will be present on the target device's class-path as JAR, but I would like to avoid the need to have the entire library (either JAR or source) present at compile-time if possible. (If it matters, the JAR is quite huge and I want to protect intellectual property too, though it's not my primary concern.)
In C++, I would solve this issue by creating a DLL (.so) and copying just the relevant class and function definition headers to my project and adding them to include path at compile time, let the dynamic linker do the job at runtime.
How to do this in Java? One idea I have would be to remove private methods/members, and strip methods of relevant classes so that their bodies are empty and the real classes and methods with same signatures are loaded at runtime. However, this approach seems quite ugly and rudimental, plus some tool would be needed to automate this process. Is there a tool to do this? Is there a better way?
I don't think it's a duplicate of this question. The point of that question is to minimize size of the resulting JAR file at compile time, by removing unnecessary classes. My point is not to remove unused definitions, but to avoid the need of having the complete library JAR at compile time at all. Though these are simmilar and there may be a way how to achieve what I want using ProGuard, the linked question does not discuss it.
There's no exact equivalent for header files in Java, but compiling against the "header" (in the meaning of "contract") without having the actual implementation can be achieved using interfaces:
Create interface for every class you want the "header" for, with relevant methods
Make the actual classes implement the respective interfaces
Pack the interfaces into one JAR, and the implementations into another one
(if you're using a build tool like Maven, use two projects, and let the implementation project depend on the interface one)
Only provide the interface JAR at compile time, and both at run time
There of course will need to be some artifact that knows the actual implementations and can instantiate them, or you'll have to use some lookup that searches classpath for a suitable implementation.
If your problem is only due to the minimization of the final jar and you use Apache Maven, you could try to use the option "provided" when you declare a dependency in the pom.xml.
For example I use this dependency declaration:
<dependency>
<groupId>org.bouncycastle</groupId>
<artifactId>bcpkix-jdk15on</artifactId>
<version>1.48</version>
<scope>provided</scope>
</dependency>
This means that my java compiler use the bouncycastle library to compile, but finally it doesn't include it in the final jar. You should provide it during execution.
Related
Is there a way to automatically find out which Java classes are actually loaded (either during compile time, as far as that's possible, or during the runtime of an application), and to throw out all other classes from a JAR to create a smaller JAR? Does that actually make sense in practice?
I am talking about the application classes for an application JAR. Usually there are lots of libraries in an application, and an application rarely needs all features of those libraries. So I suspect that would make a considerably smaller application. In theory that might be done for example via an Java agent that logs which classes and resources are read by one or several runs of an application (or even just by java -verbose:class), and a maven plugin that throws out all other classes from a jar-with-dependencies. Is there already something like that?
Clarification: I am not talking about unused dependencies (JARs that are not used at all), but about removing unused parts of each included JAR.
Well, the Maven Shade Plugin has an option minimizeJar when creating an Uber-JAR for your application:
https://maven.apache.org/plugins/maven-shade-plugin/
But, as others already pointed out, this is quite dangerous, as it regularly fails to detect class accesses which are done via Reflection or other dynamic references.
It may not be a good approach automate, as application can use reflection to initialise objects or one JAR is dependent on another JAR.
Only way that I can think of is to remove each JARs one by one and check if application runs as expected. Then again in this approach all modules of the application has to be tested, since one module can work without particular dependency and other may not.
Better solution is to take care while developing. The application developer must be careful in adding a dependency and removing unwanted dependency after his/her piece of code is done.
Global strategy.
1) Find all the classes that are loaded during runtime.
2) List of all the classes available in the classpath.
3) Reduce your class path by creating copies of jars containing only classes you need.
I have done 1 and 2 part so I can help you.
1) Find out all the classes that are loaded. You need 100 % code coverage (I am not talking about tests, but production). So run all possible scenarios, so all the classes your app needs will be loaded and logged.
To log loaded classes try several approaches. Reflection, –verbose:class flag, also you can learn about java agent. It allows to modify methods during runtime. This is an example of some java agent code or another java agent example
2) To find all the classes available in jar, you can write a program. You need to know all places where application jars are placed. Loop throw these jars (You can use ZipFile), loop through ZipFileEntry entries, and collect all classes.
3) After that write a script or program that reassembles your application. For example, now you can create a new jar file for each library and put there only needed classes.
Also you may use a tool (again, you are a programmer, so write a program), which checks code for classes dependence. You do not want to remove classes if they are used for compilation. When I was a student, I wrote code alanyzer, which builds an oriented graph for classes dependencies.
As #Gokul Nath KP notes, I did this before. I manually change gradle and maven dependencies, removing one by one, and then full regression test. It took me a week (our application was small comparing to modern world enterprise systems created by hundreds of developers).
So, be creative, and in case of success, your project will be used by millions!
After finishing my project, I want to remove all the unused classes to reduce the size of jar file when packaging.
I am using IntelliJ, it can help me detect unused classed but it includes some classes are only called by reflection (runtime only). Moreover, it cannot detect unused classes in external libraries.
One important thing, I want to remove unused classed in external libraries. Example, when I use BiMap from Google Guava, I have to include Guava lib, but I just want to use only BiMap, including whole Guava makes my jar getting big
So, I thinked reversely, instead of finding unused classes, I want to know all the classes is used/called when run (I will remove unused classed/packages manually). How can I do that?
Consider using a tool like Proguard (http://proguard.sourceforge.net/) to do this
I am unsure how you can limit the contents of the jar file to only the referenced Java classes. You may also run into issues when a class is loaded dynamically.
Guava explains on their site how you can include a subset of Guava in your build, by using ProGuard: https://github.com/google/guava/wiki/UsingProGuardWithGuava
I am building a tool from several different open source libraries. My buildpath is in the following order:
My first JAR file, stanford-corenlp-3.3.0.jar, contains a package called edu.stanford.nlp.process, which has the Morphology.class class.
My second JAR file, ark-tweet-nlp-0.3.2.jar, contains an identical package name (edu.stanford.nlp.process), and an identical class name Morphology.class.
In both JARS, inside their respective Morphology classes there exists a method called stem(). However, the constructors for these methods are different. I want to use the stem(String, String) method from my second JAR file, but since the import statement (import edu.stanford.nlp.process.Morphology;) does not specify which JAR to use, I get an error since it thinks the first JAR on the buildpath is the one I want to implement.
I don't want to change the order of my buildpath since it would throw off my other method calls.
How can I specify which JAR's Morphology class to use? Is there an import statement that specifies the JAR, along with the package.class?
EDIT: What about a way to combine my two JARs so that the two Morphology classes merge, giving me two methods with different constructors?
As several others pointed out above, it is possible to tweak Java's classloader mechanism to load classes from certain places… but this is not what you are looking for, believe me.
You hit a known problem. Instead of worrying how to tell Java to use a class from one JAR and not from the other, you should consider using a different version of ArkTweet.
Fetch the ArkTweet JAR from Maven Central. It does not contain Stanford classes.
When you notice that people package third-party classes in their JARs, I'd recommend pointing out to them that this is generally not a good idea and to encourage them to refrain from doing so. If a project provides a runnable fat-jar including all dependencies, that is fine. But, it should not be the only JAR they provide. A plain JAR or set of JARs without any third-party code should also be offered. In the rare cases that third-party code was modified and must be included, it should be done under the package namespace of the provider, not of the original third-party.
Finally, for real solutions to building modular Java applications and handling classloader isolation, check out one of the several OSGi implementations or project Jigsaw.
The default ClassLoader will only load one of the jars, ignoring the second one, so this can't be done out of the box. Maybe a custom ClassLoader can help.
For more info about ClassLoaders start from here.
Good luck!
EDIT: We are looking at some horrible packaging choices causing as side effect this Jar Hell here. The author of this "Ark Twitter" library decided it was a good idea to release a JAR artifact that includes a third party library (the Stanford NLP library). This leads to unnecessarily tight coupling between Ark Twitter and the specific version of the Stanford NLP library used by it. This is a very bad practice that should be discouraged in any case: this violates the whole idea about transitive dependencies.
EDIT (continued): One possible (and hopefully working) solution is to rebuild the Ark Twitter JAR so that it does not include the aforementioned library but only its own code (basically the cmu.arktweetnlp package only) and hoping that the version of NLP required by your project works with Ark Twitter. Ideally you should submit a pull request to the author of the library but in the meantime you can get away with un-jarring and re-jarring the existing JAR file.
EDIT 2: Looking at the JAR file again, it's much worse that I originally thought: ALL the dependencies are repackaged in the released JAR file. This is really the worst possible solution for releasing a library. Good luck.
I think your problem can be solved simply by using the lemma(String word, String tag) method in the current CoreNLP's Morphology class:
String word = ...;
String tag = ...;
String lemma = morphology.lemma(word, tag);
WordTag wt = new WordTag(lemma, tag);
When the class was revised a couple of years ago, the method you're looking for was deleted. The feeling was that with most of the Stanford NLP code moving to using CoreLabels, methods that return WordTag are less useful (though deleting all such methods is still a work in progress).
No there isn't. This is a weakness of Java, that cannot be simply solved. You should use only one of the libraries. Having both on the classpath will make java always select the first one.
This problem is named as Jar hell.
The order in the buildpath generally determines the order in which the classloader will search for the class. In general, though, you don't want duplicates of the same class in your build path--and it sure doesn't seem like ark-tweet-nlp-0.3.2.jar should have a edu.stanford package within it.
When you load a class, it's loaded at given address, and that address is then placed in the header of objects created from the class, so that (among other things) the methods in the class can be located.
So if you somehow load ClassA, with method abc(String), from zip file XYZ.zip, that loads into address 12345. Then (using a class loader trick) you load another ClassA, with method abc(String, String), from zip file ZYX.zip, and that loads into address 67890.
Now create an instance of the first ClassA. In its header will the class address 12345. If you could somehow attempt to invoke the method abc(String,String) on that class, that method would not be found in the class at 12345. (In actuality, you will not even be able to attempt the call, since the verifier will stop you because, to it, the two classes are entirely different and you're trying to use one where the other is called for, just as if their names were entirely different.)
I have a library that I'm using in an Java application - it's important for certain functionality, but it's optional. Meaning that if the JAR file is not there, the program continues on without issue. I'd like to open source my program, but I can not include this library, which is necessary to compile the source code as I have numerous import statements to use the API. I don't want to maintain two code sets. What is the best way to remove the physical jar file from open source release, but still maintain the code to support it where other people could still compile it?
the typical approach taken is to define the wrapper API (i.e. interfaces) and include those interfaces in the open sourced code, and then provide configuration options where one can specify class names of classes that implement certain interfaces.
You will import API interfaces instead of importing classes directly into your open sourced code. This way, you are open sourcing the API but not the implementation of the parts that you do not want to open source or you cannot open source.
There are many examples, but take a look at JDBC API (interfaces) and JDBC drivers (implementation classes) for starters.
I was pretty much typing the same thing as smallworld with one addition. If this API were necessary you can use a project build tool like Maven to handle the dependencies on you project. If someone checks it out from source control with the pom they can download the dependencies for themselves and you don't have to include them in a source repo.
There's probably a number of ways to fix this, here's a couple I can think of:
If you have only a couple of methods you need to invoke in the 3rd party library, you could use reflection to invoke those methods. It creates really verbose code, that is hard to read though.
If you don't have too much of the API in the 3rd party library you use, you could also create a separate JAR file, containing just a non-functional shell of the classes in the library (just types with the same names and methods with the same signatures). You can then use this JAR to distribute and compile against. At run-time you'd replace it with the real JAR if available.
The most common way is probably to just create a wrapper API in a separate module/project for the code that is dependent on the 3rd party library, and possibly distribute a pre-built JAR. This might go against your wish to not maintain two code sets, but may prove to be the best and less painful solution in the long run.
Yes i know one alternative to solving this problem is simply to create two source directories from the original. The class path for the GWT compiler would thus be setup to simply only see the compatible source while both would be used for the server portion of your app.
Firstly i find this kind of ugly, because it means i now have two source directories with potential doubles of classes.
refactoring and other structural abilities of the IDE can potentially be problematic as it will get confused.
Sometimes its not possible to put some stuff in separate packages: think client and server packages simply because one would then have to make something public which should really be package private to limit scope accessibility.
is there a library that enables classes or methods to marked as ignored by the GWT compiler ?
Is there a better way ?
You can exclude classes (files actually) from GWT's source path using Ant-like includes/excludes: http://code.google.com/webtoolkit/doc/latest/DevGuideOrganizingProjects.html#DevGuidePathFiltering
You cannot exclude methods or inner classes though, it really is file-based. See http://code.google.com/p/google-web-toolkit/issues/detail?id=3769
Make a shared directory that has the code that both the GWT side and server side can read. Any classes that would be duplicated instead go into this folder, to be accessed (without duplication!) from both client- and server-sides of your app.