How to Build Maven Modules in Parallel in Separate JVMs - java

We have a multi-module Maven project that takes about 2 hours to build and we would like to speed that up by making use of concurrency.
We are aware of the -T option which (as explained i.e. here) allows using multiple threads within the same JVM for the build.
Sadly, there is a lot of legacy code (which uses a lot of global states) in the project which makes executing multiple test in parallel in a single JVM very hard. Removing all of these blockers from the project would be a lot of work which we would like to avoid.
The surefire and failsafe plugins have multiple options regarding parallel execution behavior, however, as I understand it, this would only parallelize the test executions. Also, spawning a separate JVM for each test (class) seems kind of overkill to me. That would probably just as soon cause the build to take even longer than it does now.
Ideally, we would like to do the parallelization on the Maven reactor level and have it build each module in its own (single threaded) JVM with up to x JVMs running in parallel.
So my question is: is there a way to make maven create a separate JVM for each module build?
Alternatively, can we parallelize the build while making sure that tests (over all modules) are executed sequentially?

I am not completely sure this works but I guess if you use Maven Toolchains, then each module will start its own forked JVM for the tests, not reusing already running ones.
I guess it is worth a try.

Related

Packaging multiple Apache Beam pipelines in one jar file

I'm working on a project with many Beam pipelines written in Java that needs to be packaged as a jar file for execution from our job scheduler. I've attempted to use build profiles for creating a jar for each main but this seems messy and I've had issues with dependency conflicts (with beam-sdks-java-io-amazon-web-services when its not used its still looking for required region options). I'm also just looking for overall sustainable project structure advice for a growing Beam code base.
What are the best practices for packaging pipelines to be executed on a schedule? Should I package multiple pipelines together so that I can execute each pipeline using the pipeline name and pipeline options parameters, if so, how? (potentially using some sort of master runner main that executes pipelines based on input parameters) Or should each pipeline be its own Maven project (this requires many jars)? Thoughts?
I don't think there's a recommended way of solving this. Each way has benefits and downsides (e.g. consider the effort of updating the pipelines).
I think the common jar solution is fine if it works for you. E.g., there are multiple Beam example pipelines in the same package, and you run them by specifying the main class. It is similar to what you are trying to achieve.
Whether you need a master main also depends on specifics of your project and environment. It may be sufficient to just use java -cp mainclass and get around without extra management code.

One-time setup for surefire across multiple unit test maven modules

I have a maven project with multiple modules and submodules. Each of these modules and submodule runs several unit tests of their own. As we know, maven-surefire-plugin runs unit tests for each module separately in a new JVM.
Now, as it would often occur, I have some global configuration/setup I need to perform before running tests. This configuration/setup applies to almost all the unit tests across the modules (that means across multiple JVMs). By default, this setup now runs every time, that is, for each (sub)module. If this configuration is a little expensive, it adds a lot to the exceution time.
One such example may be setting up db (clean the db and set it up). If I have 100 modules, this setup would run 100 times.
Is their any good/elegant way of splitting such configuration so it only runs once for the whole project? Also, I do not want to send in any properties through the maven command with -D, and this is make this process completely command-independent.

Maven-surefire-plugin tests fail in Jenkins build but run successfully locally?

I have a maven project with test execution by the maven-surefire-plugin. An odd phenomenon I've observed and been dealing with is that running locally
mvn clean install
which executes my tests, results in a successful build with 0 Failures and 0 Errors.
Now when I deploy this application to our remote repo that Jenkins attempts to build, I get all sorts of random EasyMock errors, typically of the sort:
java.lang.IllegalStateException: 3 matchers expected, 4 recorded. at org.easymock.internal.ExpectedInvocation.createMissingMatchers
This is a legacy application being inherited, and we are aware that many of these tests are flawed if not plainly using EasyMock incorrectly, but I'm in a state where with test execution I get a successful build locally but not in Jenkins.
I know that the order of execution of these tests is not guaranteed, but I am wondering how I can introspect what is different in the Jenkins build pipeline vs. local to help identify the issue?
Is there anything I can do to force execute the tests in the way they're done locally? At this point, I have simply excluded many troublesome test classes but it seems that no matter how many times I see a Jenkins failure, I either fix the problem or exclude the test class, I'm only to find it complain about some other test class it didn't mention before.
Any ideas how to approach a situation like this?
I have experimented quite a similar situation, and the cause of mine was obviously some concurrency problems with the tests implementations.
And, after reading your comment:
What I actually did that fixed it (like magic am I right?) is for the maven-surefire plugin, I set the property reuseForks=false, and forkCount=1C, which is just 1*(number of CPU's of machine).
... I get more convinced that you have concurrency problems with your tests. Concurrency is not easy to diagnose, specially when your experiment runs OK on one CPU. But race conditions might arise when you run it on another system (which usually is faster or slower).
I recommend you strongly to review your tests one by one and ensure that each one of them is logically isolated:
They should not rely upon an expected previous state (files, database, etc). Instead, they should prepare the proper setup before each execution.
If they modify concurrently a common resource which might interfere other test's execution (files, database, singletons, etc), every assert must be done synchronizing as much as needed, and taking in account that its initial state is unknown:
Wrong test:
MySingleton.getInstance().put(myObject);
assertEquals(1, MySingleton.getInstance().size());
Right test:
synchronized(MySingleton.getInstance())
{
MySingleton.getInstance().put(myObject);
assertTrue(MySingleton.getInstance().contains(myObject));
}
A good start point for the reviewing is checking one of the failing tests and track the execution backwards to find the root cause of the fail.
Setting explicitly the tests' order is not a good practice, and I wouldn't recommend it to you even if I knew it was possible, because it only would hide the actual cause of the problem. Think that, in a real production environment, the executions' order is not usually guranteed.
JUnit test run order is non-deterministic.
Are the versions of Java and Maven the same on the 2 machines? If yes, make sure you're using the most recent maven-surefire-plugin version. Also, make sure to use a Freestyle Jenkins job with a Maven build step instead of the Maven project type. Using the proper Jenkins build type can either fix build problems outright or give you a better error so you can diagnose the actual issue.
You can turn on Maven debug logging to see the order tests are being run in. Each test should set up (and perhaps tear down) its own test data to make sure the tests may run independently. Perhaps seeing the test order will give you some clues as to which classes depend on others inappropriately. And - if the app uses caching, ensure the cache is cleaned out between tests (or explicitly populated depending on what the test needs to do). Also consider running the tests one package at a time to isolate the culprits - multiple surefile plugin executions might be useful.
Also check the app for classpath problems. This answer has some suggestions for cleaning the classpath.
And another possibility: Switching to a later version of JUnit might help - unless the app is using Spring 2.5.6.x. If the app is using Spring 2.5.6.x and cannot upgrade, the highest possible version of JUnit 4.x that may be used is 4.4. Later versions of JUnit are not compatible with Spring Test 2.5.6 and may lead to hard-to-diagnose test errors.

How to separate class loader for different jar version?

I have a test war file that contains many tests. Each test is packaged in maven project with a lot of dependencies. We use maven for dependency management but it comes with a problem. When a test update a common library, it can break other test that depends on the older version of the lib. How to make all the test run in a completely separate environment with its own set of library version? I can't execute them in a separate jvm because these tests need to be executed very frequently like very 30 sec or so. Can OSGi help solve this problem?
Yes OSGi can solve this problem, but it is not a step to be taken lightly. Use OSGi when you are ready to commit time and effort to isolating and managing dependencies, versioning them properly and, optionally, making your code and architecture more modular/reusable.
Bear in mind that adopting OSGi can be painful at first due to non-modular practices used by some legacy libraries.

Alternative build manager to Hudson

I work at a software company where our primary development language is Java. Naturally, we use Hudson for continuous builds, which it works brilliantly for. However, Hudson is not so good at some of the other things we ask it to do. We also use Hudson jobs to deploy binaries, refresh databases, run load testing, run regressions, etc. We really run into trouble when there are build dependencies (i.e. load testings requires DB refresh).
Here's the one thing that Hudson doesn't do that we really need:
Build dependency: It supports build dependencies for Ant builds, but not for Hudson jobs. We're using the URL invocation feature to cause a Hudson job to invoke another Hudson job. The problem is that Hudson always returns a 200 and does not block until the job is done. This means that the calling job doesn't know a) if the build failed and b) if it didn't fail, how long it took.
It would be nice to not have to use shell scripting to specify the behavior of a build, but that's not totally necessary.
Any direction would be nice. Perhaps we're not using Hudson the right way (i.e. should all builds be Ant builds?) or perhaps we need another product for our one-click deployment, load testing, migration, DB refresh, etc.
Edit:
To clarify, we have parameters in our builds that can cause different dependencies depending on the parameters. I.e. sometimes we want load testing with a DB refresh, sometimes without a DB refresh. Unfortunately, creating a Hudson job for each combination of parameters (as the Join plugin requires) won't work because sometimes the different combinations could lead to dozens of jobs.
I don't think I understand your "build dependency" requirements. Any Hudson job can be configured to trigger another (downstream) job, or be triggered by another (upstream) job.
The Downstream-Ext plugin and Join plugin allow for more complex definition of build dependencies.
There is a CLI for Hudson which allows you to issue commands to a Hudson instance. Use "help" to get precise details. I believe there is one which allows you to invoke a build and await its finish.
http://wiki.hudson-ci.org/display/HUDSON/Hudson+CLI
Do you need an extra job for your 'dependencies'?
Your dependencies sound for me like an extra build step. The script that refreshes the DB can be stored in your scm and every build that needs this step will check it out. You can invoke that script if your parameter "db refresh" is true. This can be done with more than just one of your modules. What is the advantage? Your script logic is in your scm (It's always good to have a history of the changes). You still have the ability to update the script once for all your test jobs (since hey all check out the same script). In addition you don't need to look at several scripts to find out whether your test ran successful or not. Especially if you have one job that is part of several execution lines, it becomes difficult to find out what job triggered which run. Another advantage is that you have less jobs on your Hudson and therefore it is easier to maintain.
I think what you are looking for is http://wiki.jenkins-ci.org/display/JENKINS/Parameterized+Trigger+Plugin This plugin lets you execute other jobs based on the status of previous jobs. You can even call a shell script from the downstream project to determine any additional conditions. which can in turn call the API for more info.
For example we have a post-build step to notify us, this calls back the JSON API to build a nice topic in our IRC channel that says "All builds ok" or "X,Y failed" , etc.

Categories

Resources