We are a large company with about 2000 separate Java projects. For historic reasons, we do not have multi-module projects, but we would like to introduce them.
Logically, we already have "groups" of projects, i.e. someone responsible for (say) 50 projects which are closely related. This someone regularly publishes a BOM which contains recent, coherent versions of these 50 projects.
Now it would make a lot of sense to grab these 50 projects and put them into one large multi-module project. Still, it would be necessary to publish a BOM because other projects (outside our group) should have coherent versions.
So, summarised, we need a BOM that contains the versions of all 50 projects that are part of the multi-module project. I wonder what would be the "Maven way" to create such a BOM. What I can think of:
The bom is the 51st project of the multi-module project. The versions of the dependencies are set by properties in the parent pom.
The bom is generated from the information present in the multi-module project and published as side artifact (this probably requires us to write a Maven plugin for this).
What would be advisable?
We are using BOMs as well for our multi-modules projects, but we are not tying their generation or update to the build of those modules.
A BOM is only updated when our release management process completes the delivery of a built module (or group of modules): once delivered, then the BOM is updated and pushed to Nexus (stored as a 1.0-SNAPSHOT version, constantly overridden after each delivery)
The BOM is then included within our POM (for mono or multi-module projects) and use for dependency management only, meaning our projects depends on artifact without the version: the dependency management from the BOM provides with the latest delivered version of other dependent modules.
In other words, we separate the build aspect (done here with maven) from the release part: the "bills of materials" represent what has been delivered, and ensure all projects are building with versions deemed working well together (since they have been delivered into production together).
I've never seen 2K of commercial Java projects, so will base my answer on how open source works:
Libraries shouldn't be grouped by people - they should be grouped by the problems that they solve. Often open source projects have multiple libs e.g. Jackson has jackson-databind, jackson-datatype-jsr310, etc. These libs tightly relate to each and may depend on each other.
Such groups shouldn't be too big. Some projects may have 1, others - 5 or 10. 50 libs in a group sounds way too much.
It's easier if libs in a group are released all at the same time (even if only one is updated). This makes it straightforward to keep track of versions in the apps that use multiple libs from a group.
There should be no dependencies between groups! And this is probably the most important rule. Deep hierarchy of libraries that depend on each other is not acceptable because now you need to keep compatibility between many projects and libs. This just doesn't scale. Which means there will be occasional copy-paste code between libs - this is the lesser evil.
There could be some exceptions to the last rule (maybe a lib that is used everywhere) but those must keep backward compatibility of the public API until there are no projects that depend on the old API. Such libs are very hard to maintain and it's better to opensource them.
Standalone projects now can depend on libraries from the same or different groups, but because the version within the group is the same, it's easy to set it as a property just once. Alternatively:
You can look at <scope>import</scope> which allows importing <dependencyManagement> sections from other POM files like parent POMs within a group (for some reason never worked for me).
Or at xxx-all modules - a module that depends on all other modules within group and thus when you depend on it, you also depend on others transitively.
Related
So I have this big spring boot project with hundreds of APIs and corresponding models. I was asked to seperate it into three different modules. Storefront, Order Management System and Utilities. As per my basic, plan I sorted, filtered and moved Storefront and OMS APIs to their corresponding projects. I moved all the model classes into Utilities, created a package, added this package to the local repository and included it as dependencies of Storefront and OMS. Further Iexported these two projects as a runnable jar with copying required libraries into a sub folder next to the generated jar. And I did this because the sub folder will include the package for Utilities, and if in future, I have to update something from Utilities, I could just replace this package and restart the server.
Everything is working fine, the problem is with the size of the final package. The jar size of the original project is 175 MBs. All three projects have similar .pom files. So all three project export to a size which is almost 175 MBs. And as I said, I included the package for Utilities in other two projects. So the size of the sub folder for Storefront and OMS became around 350 MBs.
Finally, my question is, is there any way to split a maven project into 3 different sub projects which can be built and deployed independently and, is there any way these 3 projects share a set of libraries which can be stored remotely and accessed by them independently so to decrese the size of the final runnable jar?
I think there are some deeper issues involved. If your artifact has 175MB and most of this are dependencies, than you have very very many dependencies.
So first of all, you should ask yourself, if all of those dependencies are really necessary. It is not unusual that people add a dependency just to use one simple class from that dependency. And this dependency than has a transitive burden. 175MB really calls for a deeper analysis of this fact.
Next, if you see you cannot really reduce the dependencies any more, you can split the project into several ones (like you started to do). But, then most of the dependencies should be in just one of the resulting projects. If all of your resulting projects use all of the dependencies, then these projects are all doing similar, probably overlapping things, which is not good.
I started reading about good practices related to multi-module maven projects, specifically about the advice to use separate pom aggregators and parents. I found this site which contains the following paragraph:
What you can see in the module POMs example is that it inherits its version from the parent POM. This is quite natural for small to medium sized projects. The team makes code changes on every module for the next release.
You may use different versions for your modules, but this will not take you very far. If you release the project via its aggregate parent POM, all modules will get released, with their individual version. That’s fine – at first thought. If you look closer you will notice that the version will increase, even if some of the modules have not any changes at all. With this approach, having a multi-module aggregate parent POM using different versions for each submodule, you only have the flexibility to decide, if a new version is major, minor or micro. But is this worth the effort?
As far as I know, when you run the release plugin, it will ask for the new versions of each sub-module. If there were no changes made to a particular sub-module, you can decide not to bump its version.
What did the author mean?
By that paragraph, the author intended to highlight the fact that you can not decide not to build a sub-module, the only thing that you have control upon is the new development version.
EDIT: This is about doing Continuous Delivery with Maven and having it orchestrated with Jenkins. Maven is definitively not designed for that, and this question is part of our effort to get an efficient workflow without using Maven releases. Help is appreciated.
We use Maven -SNAPSHOTs within major versions to ensure customers always get the latest code for that given version, which works well. For technical reasons we have two independent Maven jobs - one for compiling sources to jars, and one for combining the appropriate jars to a given deployment. This also works well.
We then have Jenkins orchestrating when to invoke the various steps, and this is where it gets a bit tricky, because if we do the normal mvn clean install in step one, this means that all the snapshot artifacts get recompiled, which in turn makes Jenkins think that all the snapshots changed (as their fingerprint - aka MD5 checksum - changed) even if the sources used to generate the artifacts did not change, triggering all the downstream builds instead of just those which dependencies did change.
I have so far identified these things as varying between builds:
META-INF/maven/.../pom.properties (as it contains a timestamp)
META-INF/MANIFEST.MF (contains JDK and user)
timestamps in jar file
I have found ways around the two first, but the latter is a bit more difficult. It appears that AbstractZipArchiver (which does all the work in zipFile() and zipDir()) is not written to allow any kind of extension to how the archive is being generated.
For now I can imagine four approaches (but more ideas are very welcome):
Create a derivative of the current maven-jar-plugin implementation allowing for a timestamp=<number> attribute which is then used for all entries inserted into the jar file. If not set, the current behavior is kept.
Revise the Jenkins fingerprinting scheme so it knows about jar files and only looks at the entries contents, not their metadata.
Attach a plugin to the prepare-package stage responsible for touching the files with a specific time stamp. This requires all files to be present at that time (meaning that the jar plugin cannot be allowed to touch the MANIFEST.MF file)
Attach an extra plugin to the "package" phase which rewrites the finished jar file, zeroing out all zip entry timestamps in the process.
Again, the goal is to make maven SNAPSHOT artifacts fully time independent so given the same source you get an artifact with the same MD5 checksum. I also believe, however, that this could be beneficial for release builds.
How should I approach this?
As per my comment, I still think the answer is to do none of the things you suggest, and instead use releases in preference to snapshots for artifacts which you are in fact releasing to customers.
The problems you describe are:
you have a multi-module project which takes a long time to build because you have more than 100 modules,
you have two snapshot artifacts which you think ought to be identical (because the source code and metadata were identical at build time), but they have different checksums.
My experience with Maven tells me that if you try and adhere to the "Maven Way", tools will work well for you out-of-the-box, but if you deviate then you'll have a bad time. Unfortunately, the Maven Way is sometimes elusive :-)
Multi-module projects in Maven are very useful when you have families of modules with code that varies in sympathy, e.g. you have a module containing a bunch of interfaces, and some sibling modules providing implementations. It would be unusual to have more than a dozen modules in a multi-module project. All the modules ought to share the version number of the parent (Maven doesn't enforce this, which in my opinion is confusing).
When you build a snapshot version of a multi-module project, snapshots of all modules are built, even if the code in a particular module hasn't changed. Therefore you can look at a family of modules in your repositiory, and know that at compile time the inter-module code references were satisfied.
For example, in a domain model module you might have an interface:
public interface Student {
void study();
}
and in some sibling modules, which would declare compile-scoped dependencies on the domain model in their POMs, you might have implementations.
If you were then to change the interface in the domain model module:
public interface Student {
void study();
void drink(Beer beer);
}
and rebuild the multi-module project, the build will fail. The dependent modules will fail to build, even though their code and POMs have remained the same. In a multi-module project, you only install or deploy artifacts if all the child modules build successfully, so rebuilding snapshots is usually very desirable - it's telling you something about the inter-module dependencies.
If:
you have an excessive number of modules, and/or
those modules can't reasonably share the same version number, and/or
you don't need any guarantees about code references between modules,
then your modularisation is incorrect. Don't use multi-module projects as a build system (you have Jenkins for that), use it instead to express relationships between modules of your code.
In your comment, you say:
RELEASE artifacts behave the same way when being rebuilt by Jenkins.
The point of point of release artifacts is that you do not rebuild them - they are definitive! If you use something like Artifactory, you will find that you cannot deploy a release artifact more than once - your Jenkins job should fail if you attempt it.
This is a fundamental tenet in Maven. One of the aims of Maven is that it if two developers on separate workstations were to attempt the same release, they would build artifacts which were functionally identical. If you are build an artifact which expresses a dependency (maybe for compilation purposes, or because it's being assembled into .war etc.) on another, then:
if the dependency is a snapshot, Maven might seek a newer version from the repository.
if the dependency is a release, the version in your local repository is assumed to be definitive.
If you could rebuild a release artifact, you would create the possibility that two developers have dissimilar versions in their repository, and you'd have dissimilar builds depending on which workstation you used. Don't do it.
Another critical detail is that a release artifact cannot depend on snapshot artifacts, again, you would lose various guarantees.
Releases are definitive, and it sounds like you want your assembly to depend on definitive artifacts. Jenkins makes tagging and releasing multi-module projects very straightforward.
In summary:
Check your modularisation: one enormous multi-module project is not useful.
If you don't want to continually rebuild snapshots, you need to do releases.
Never release snapshots to your customer.
Follow the dependency graph of your assembly project and release any snapshots.
Release the assembly project, bumping your minor version.
Ensure your customer refers to the complete version number of your assembly in communications.
I have a project that has 3rd party dependencies, as well as dependencies on internal projects. I need to strip the version numbers from the dependent artifacts that are developed in-house.
For example: spring-2.5.6.jar should be in the final output as spring-2.5.6.jar but MyInternalProject-1.0.17.jar needs to be changed to MyInternalProject.jar.
I can identify the internal dependencies easily enough by their group ID (they are all something like com.mycompany.*). The maven-dependency-plugin has a stripVersion option, but it does not seem to be selective enough. Is there a way to do this, short of explicitly naming each dependency and what their final name should be?
Phrased another way:
I would like to have different outputFileNameMappings for the maven-assembly-plugin for artifacts based on group ID. Is there a way to do this?
I think you can using the following recipe:
First, in your aggregator pom use the dependency:copy-dependencies goal to copy your jars to some intermediate location. You will need two executions, one with <stripVersion>true</stripVersion> for your internal dependencies; and one with <stripVersion>false</stripVersion> for 3rd party libraries. You may include/exclude artifacts based on GroupId, see http://maven.apache.org/plugins/maven-dependency-plugin/copy-dependencies-mojo.html for full details.
Then it should be a simple task to build a .zip using the maven-assembly-plugin!
Based on the comments, I would re-evaluate your approach here. Generally checking jars into source control is not a good idea, especially unversioned jars. Imagine if I just had a project that referenced someArtifact.jar and I was trying to debug it - how would I know which version it used?
Tools like artifactory and nexus were built for storing versions of your jars, both internal and 3rd party (they can also proxy public repositories like maven central). In order to keep builds reproducible, I would check your binaries into a tool designed for that, and then you can reference them by version. During development, you can reference SNAPSHOT versions of your jars to get the latest, and then when you do a release, you can reference stable versions of your binaries.
Source control systems were meant for storing source, not binaries.
I hope I can keep this question specific enough, my team at work is currently debating the best way to manage our dependencies for a huge project (150+ dependencies ~300mb).
We have two main problems
Keeping all the developers dependencies the same so we are compiling against the same files
Ensure the project (once compiled) is comliped against the same dependencies
The two ideas that have been suggested are using a BirJar (all dependencies in one file) and just adding a version number to it and using a shared folder and pointing everyone's machines at the same place.
Or making including all the dependencies in the jar when we compile it (a jar, of jars, of jars) and just have a project that "has no dependencies"
Someone also mentioned setting up an internal version of Ivy and pointing all the code to pull dependencies from there.
What are the best practices regarding massive dependency management?
Why don't you use Maven and its dependency management ?
You can specify each dependency, its particular version and its scope (compile-time, for testing, for deployment etc.). You can provide a master pom.xml (the config file) that specifies these, and developers can override if they need (say, to evaluate new versions).
e.g. I specify a pom.xml that details the particular jars I require and their versions (or range). Dependent jars are determined/downloaded automatically. I can nominate which of these jars are used for compilation vs. deployment etc. If I use a centralised repository such as Nexus I can then build my artefact (e.g. a library) and deploy that into Nexus, and it'll become available for other developers to download in exactly the same manner as 3rd party libs etc.
Incase you dont like/want to follow the Maven project structure...
If you already use Ant, then your best bet is to use Ivy for dependency management.
http://ant.apache.org/ivy/
It provides a rich set of ant tasks for dependency manipulation.
from : Ant dependency management