Is it possible to optimize maven dependencies automatically? - java

I am working on a big project that consists of about 40 sub-projects with very not optimized dependencies. There are declared dependencies that are not in use as well as used but undeclared dependencies. The second case is possible when dependency is added via other dependency.
I want to remove redundant and add required dependencies. I ran mvn dependency:analyze and got a long list of warnings I have to fix now.
I wonder whether there is maven plugin or any other utility that can update my pom.xml files automatically. I tried to do it manually but it takes a lot of time. It seems it will take a couple of days of copy/paste to complete the task.
In worse case I can write such script myself but probably ready stuff exists?
Here is how mvn dependency:analyze reports dependency warnings:
[WARNING] Used undeclared dependencies found:
[WARNING] org.apache.httpcomponents:httpcore:jar:4.1:compile
[WARNING] Unused declared dependencies found:
[WARNING] commons-lang:commons-lang:jar:2.4:compile
[WARNING] org.json:json:jar:20090211:compile

I would not say: with very not optimized dependencies. it's simply someone has not done his job well, cause defining dependencies which are not used shows someone didn't understand what a build tools is and how its working. That can be compared with a Java file which contains many unused imports. In the case of the unused imports in a Java sources this can simply be handle by the IDE but for dependencies in Maven there does not exist such a simple way as already been expressed the problem are kinds of DI etc. which makes this job hard. You can try to output the result of dependency:analyze into a script (there exist an option for that goal) and afterwards test the resulting build after cleaning up the dependencies.
It might be a good idea to run
mvn dependency:analyze -DscriptableOutput=true
which produces output which can be very simple extracted from the output and can be used for further processing like using as input for the versions-maven-plugin (with some pre conversion).

I would not recommend to clean-up dependencies automatically.
Adding of all 'Used undeclared...' lead to duplication of most of transitive dependencies that lead to spending of more time to reading and managing them.
Removing of all 'Unused declared...' can lead to errors in run-time because they are: called by reflection or specially declared to override version of the same artifact that already used in 3rd party dependencies (changing their compile scope to runtime is preferable, while test scope should be untouched to avoid leak them to production package) or added to declare usage of an optional transitive dependency of some 3rd party library etc.

Related

Meaning of Maven dependency scopes "compile" vs. "runtime"

From the viewpoint of a Gradle java library author, I understand that a dependency specified in the implementation configuration will be marked with the runtime scope in the resulting POM file that gets published (using the maven-publish Gradle plugin). This makes sense, as anyone consuming my published library doesn't need the dependency for compilation (it is an internal dependency), but instead only for runtime. If I specify a dependency in the api configuration, it will be marked with the compile scope in the resulting POM file, which again makes sense, as anyone consuming my library needs this for compilation (and runtime).
This makes me believe that the meaning of the Maven dependency scope is relative to anyone consuming the component, and not relative to the component itself. Consider a published Maven library (containing Java class files) a dependency marked with compile should mean:
If you compile against me, then use this dependency on the compilation classpath too!
However, according to the Maven docs, it seems that it means:
I was compiled with that dependency on my compilation classpath, and if you want to compile me again, do the same!
If this were true, then one could not distinguish between API-dependencies and implementation-dependencies like Gradle does. Also, it would only make sense if the published component actually contains the sources, not only the class files.
Did Gradle actually "misuse" the meaning of these scopes to make some improvements, or did I fundamentally misunderstand something?
Gradle cleverly "misuses" the scopes.
Maven has the design flaw that the build POM is published 1:1 as consumer POM (this will change with the upcoming Maven 4.x). So Maven does not have the chance to use something for compilation in the project, but for runtime when consumed by another project (at least not without applying tricks). The Maven docs therefore do not discuss the possibility of "implementation/api".

mvn dependency:analyze usage

Is it still good practice handling your dependencies so that
mvn dependency:analyze
does not show any warning ?
it complains when code is explicitly using dependency wothout it being declared, or in case code is not using a declared dependency
For the latter case I can think of more than a couple of scenarios when
we actually need to have "unused" dependencies.
But for the first case, should we always make sure we have no warnings ?
There are exceptions where you need an artifact as dependency, but it is not "used" in the classical sense by your source code. In this case, you can define an exception by settings the <ignoredDependencies> parameter for dependency:analyze.
it's used to find out dependencies which are not used in your project. In other words, you may have had added some dependencies to your project in the development phase but eventually, you do not have any use for those in your code. this command helps you to find and remove them to have a lighter jar file.

Analysis of unused transitive dependencies on the class level

Assume the following situation: my Maven project depends on a jar A, which depends on 10 other jars which transitively depend on a lot more other jars. I get a huge classpath and if am building a war/ear, I get a huge artifact.
Actually, I am using only the class foo in jar A. The class foo uses a few other classes, which are contained in three other jars. So I really only need jar A and three other jars to compile, not the whole bunch of dependencies (and their dependencies and so on).
Is there a way to (semi-)automatically analyse dependency trees on the class level? As far as I know Maven has no build-in functionality for this.
Just to make this clear: I know that such situations should not occur in a good software architecture. But if I get a jar A which is really just a collection of classes for different purposes, I potentially get a lot of unnecessary dependencies when I build the dependency tree with Maven. And changing A is not something I can do.
Some (long) time ago I've started Maven plugin for this:
https://github.com/highsource/storyteller-maven-plugin
How to find unneccesary dependencies in a maven multi-project?
It works but in no way finished/documented etc. I also don't want to "sell" it here in any way.
But what you write were exactly my thoughts then. maven-storyteller-plugin basically analyzed dependencies of classes and built a huge graph of them. Then it could tell if you actually need dependencies you've declared in your project or not. It could also export nice graphs of dependencies (using GraphViz).
I never had time to finish it, but maybe someone would be interested? Heavylifting is done already.

How do I check jar file dependencies

I am coming from .NET background and I need to do some JAVA work these days. One thing I don't quite understand is how JAvA runtime resolve its jar dependencies. For example, I want to use javax.jcr to do some node adding. So I know I need to add these two dependencies because I need to use javax.jcr.Node and org.apache.jackrabbit.commons.JcrUtils.
<dependency>
<groupId>javax.jcr</groupId>
<artifactId>jcr</artifactId>
<version>2.0</version>
</dependency>
<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-jcr-commons</artifactId>
<version>2.8.0</version>
</dependency>
</dependency>
Now I passed the compilation but I get an exception in runtime. Then someone told me to add one more dependency which solves the problem.
<dependency>
<groupId>org.apache.jackrabbit</groupId>
<artifactId>jackrabbit-jcr2dav</artifactId>
<version>2.6.0</version>
</dependency>
From my understanding, jackrabbit-jcr-commons needs jackrabbit-jcr2dav to run. If the jar misses a dependecy, how can it pass the compilation? And also how do I know I miss this particular dependency from jcr-common? This is a general question, it doesn't have to be specific to java jcr.
Java doesn't have any built-in way to declare dependencies between libraries. At runtime, when a class is needed, the Java ClassLoader tries to load it from all the jars in the classpath, and if the class is missing, then you get an exception. All the jars you need must be explicitly listed in the classpath. You can't just add one jar, and hope for Java to transitively load classes from this jar dependencies, because jar dependencies are a Maven concept, and not a Java concept. Nothing, BTW, forbids a library writer to compile 1000 interdependant classes at once, but put the compiled classes in 3 several different jars.
So what's left is Maven. I know nothing about JCR. But if a jar A published on Maven depends on a jar B published on Maven, then it should list B in its list of dependencies, and Maven should download B when it downloads A (and put both jars in the classpath).
The problem, however, is that some libraries have a loose dependency on other libraries. For example, Spring has native support for Hibernate. If you choose to use Spring with Hibernate, then you will need to explicitly declare Hibernate in your dependencies. But you could also choose to use Spring without Hibernate, and in that case you don't need to put Hibernate in the dependencies. Spring thus chooses to not declare Hibernate as one of its own dependencies, because Hibernate is not always necessary when using Spring.
In the end, it boils down to reading the documentation of the libraries you're using, to know which dependencies you need to add based on the features you use from these libraries.
Maven calculates transitive dependencies during compile-time, so compilation passes ok. The issue here is that, by default, maven won't build a proper java -cp command line to launch your application with all of its' dependencies (direct and transitive).
Two options to solve it:
Adjust your Maven project to build a "fat jar" -- jar which will include all needed classes from all dependencies. See SO answer with pom.xml snippet to do this: https://stackoverflow.com/a/16222971/162634. Then you can launch by just java -cp myfatjar.jar my.app.MainClass
For multi-module project, with several result artifacts (that is, usually, different java programs) it makes sense to build custom assembly.xml which will tell Maven how to package your artifacts and which dependencies to include. You'll need to provide some kind of script in resulting package which will contain proper java -cp ... command. As far as I know, there's no "official" Maven plugin to build such a script during compilation/packaging.
There's free Maven book which more or less explains how dependencies and assemblies work.
Your question mixes Maven (a java-centric dependency resolution tool) and Java compile-time and run-time class-resolution. Both are quite different.
A Java .jar is, in simplified terms, a .zip file of Java .class files. During compilation, each Java source file, say MyClass.java, results in a Java bytecode file with the same name (MyClass.class). For compilation to be successful, all classes mentioned in a Java file must be available in the class-path at compile-time (but note that use of reflection and run-time class-name resolution, ala Class.forName("MyOtherClass") can avoid this entirely; also, you can use several class-loaders, which may be scoped independently of each other...).
However, after compilation, you do not need to place all your .class files together into the same Jar. Developers can split up their .class files between jars however they see fit. As long as a program that uses those jars only compile-time refers to and run-time loads classes that have all their dependencies compile-time and run-time available, you will not see any runtime errors. Classes in a .jar file are not recompiled when you compile a program that uses them; but, if any of their dependencies fails at run-time, you will get a run-time exception.
When using Maven, each maven artifact (typically a jar file) declares (in its pom.xml manifest file) the artifacts that it depends on. If it makes any sense to use my-company:my-library-core without needing my-company:my-library-random-extension, it is best practice to not make -core depend on -random-extension, although typically -random-extension will depend on -core. Any dependencies of an artifact that you depend on will be resolved and "brought in" when maven runs.
Also, from your question, a word of warning -- it is highly probable that jackrabit-jcr2dav version 2.6.0 expects to run alongside jackrabbit-jcr-commons version 2.6.0, and not 2.8.0.
If I had to guess (without spending too much time delving into the Maven hierarchies of this particular project), I believe your problem is caused by the fact that jackrabbit-jcr-commons has an optional dependency on jackrabbit-api. That means that you will not automatically get that dependency (and it's dependencies) unless you re-declare it in your POM.
Generally speaking, optional dependencies are a band-aid solution to structural problems within a project. To quote the maven documentation on the subject (http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html):
Optional dependencies are used when it's not really possible (for
whatever reason) to split a project up into sub-modules. The idea is
that some of the dependencies are only used for certain features in
the project, and will not be needed if that feature isn't used.
Ideally, such a feature would be split into a sub-module that depended
on the core functionality project...this new subproject would have
only non-optional dependencies, since you'd need them all if you
decided to use the subproject's functionality.
However, since the project cannot be split up (again, for whatever
reason), these dependencies are declared optional. If a user wants to
use functionality related to an optional dependency, they will have to
redeclare that optional dependency in their own project. This is not
the most clear way to handle this situation, but then again both
optional dependencies and dependency exclusions are stop-gap
solutions.
Generally speaking, exploring the POMs of your dependencies will reveal this kind of problem, though that process can be quite painful.

Maven and optional runtime dependencies

I'm starting to fix a java project that has used maven and while I've got the project to build, at runtime it fails with missing dependencies. I've had a look and the errors are missing optional dependencies of included compile time dependencies. I can go through and add these but it seems to me that I can have everything building and running nicely only for some piece of code that I missed to use a missing dependency and the whole thing falls apart.
What I really want to know is whether there is an automated way to find optional dependencies that I have chosen to not include. I have used mvn dependency:tree but this only shows the dependencies I have (not sure of the scope it checks) and I have tried mvn dependency:analyze but this seems to show dependencies it thinks I don't use and those that have been pulled down indirectly. What I cannot see is how to see a list of optionals I don't include.
Currently my method of working around this is to read the poms and try to work it out from there, but I don't see this as particularly robust.
For reference, I am fairly new to maven style dependency management and on the face of it like it, but this optional thing is a bit of a stumbling block for me. I understand that optionals stop me pullin down dependencies I won't be using, but it hasn't clicked for me how I can workout what optionals are available and that I do need.
I am using Eclipse Juno, m2Eclipse (also have maven 3.0.5 cli), java 6/7.
Anyone got any ideas of how I can do this better, or what I am completely overlooking?
No the things are - somewhat - just this way. Maven does not do dependency management, it allows you to do dependency management by offering tools to use and analyze them. So the work still is on the developers side. People often mix that up.
This is mainly caused by the fact that projects often have different deployment targets. As a result sometimes they collect a bunch of jar files which are copied once into tomcat and a different set of files for weblogic. So there might be a readme in your project that states what to copy prior to deployment of the maven artifacts. Or it is implicit knowledge - then you're doomed.
dependency:analyze works on bytecode not on sources. therefore it does not see what maven knows.
Maybe mvn help:effective-pom gives a better basis to analyze the whole thing? Or you could try to modify the dependency plugin to show that information as well. Maven plugins are not so hard to work with.
I'm not aware of a plugin that displays all optional transitive dependencies. But since the pom.xml files of dependencies are downloaded into the local maven repo you could do a text search there.
A while ago there was a discussion on optional dependencies as well: Best strategy for dealing with optional dependencies - it might be helpful too.

Categories

Resources