Extract a reference graph while compiling Java codebase?

Extract a reference graph while compiling Java codebase? - java

Background:
I'm working with (for me) a reasonably large codebase (eg: I've only got a few of the related projects checked out at the moment, and its > 11000 classes).
Build is ant, Tests are JUnit, CI is Jenkins.
Running all tests before checkin is not an option, it takes Jenkins hours. Even for some of the individual apps it can be 45 minutes.
There are some tests that don't reference individual methods based on reflection, and in some cases don't even directly reference the class of the tested methods, as they interrogate an aggregator class, and are aware of the patterns of pass-through methods in use here. As it's a big codebase, > 10 developers, and I'm not in charge, this is something I can not change for now.
What I want, is the ability to before check-in print out a list of all test classes that are two degrees away (Kevin-Bacon-wise) from any class in the git diff list. This way I can run them all and cut down on angry emails from Jenkins when something I missed eventually gets run and has an error.
The easiest way I can think of to achieve this is to code it myself with a Ruby script or similar, which allows me to account for some of the patterns we're using, but to do it I need to be able to query "which classes reference class X?"
I could parse .java or (easier) .class files to get this info, but I'd rather not :) Is there a way I can make Javac export it in a simple format as it compiles?

Is there a way I can make Javac export it in a simple format as it compiles?
AFAIK, no.
However, there are other ways to get a list of the dependencies:
How do I get a list of Java class dependencies for a main class?.
(Note however that you are unlikely to get a static tool to extract dependencies resulting from Class.forName(), etcetera. Also note that you cannot infer the complete set of dependencies from bytecode files because of the way that "compile time constants" are handled.)
It strikes me that there are a few problems here:
It sounds to me like your build, and indeed your project structure is monolithic. If you could restructure the code base into large-scale modules that build separately (according to their dependencies), and version controlled separately, then you only need to do a full build and run all unit tests when there is a change high up ... in a module that everything else depends on. (Can I suggest the "Maven" word. It really helps for a large codebase, and 11,000 classes is large.)
It sounds like you may be suffering from the "branches are hard" problem of classic VCS systems.
It sounds like you may need a beefier CI system. If you've got more cores and the build framework is right, you should be able to get faster CI builds. (And if you modularize so that you rebuild less ...)
I think it might be easier to address your slow build/test cycle that way rather than via extra (possibly bespoke) tooling to do dependency analysis.
But I recognize that it may not be up to you to make those decisions.

Related

Maven Plugin for Enforcing Java File Size

I'm looking for a plugin that will allow me, at build time, to enforce that my Java files don't exceed a certain size. For instance, if it's decided that 500 lines is too many lines for a class, then the build will fail if any classes exceed 500 lines.
For something similar, I'm thinking of jacoco where you can configure different parameters but, of course, instead of analyzing test code coverage, it analyzes the actual number of lines in each class.
Does such a plugin exist?

Static code analysis (SCA) tools check things like that but I'm not aware of a Maven plugin that fails a build if such a limit is exceeded. The tools I know just create reports to inform you about such a circumstance.
Even if it existed I wouldn't use such a plugin. Too long classes are a matter of refactoring, not a matter of releasing or not working code.

Automatically delete unnecessary files in an Eclipse project

I forked a repository from Github that has a lot of packages and files, implementing all kind of algorithms, simulations and utility classes.
However, in my research I don't need all of these files/packages for my own simulation to work.
I would like to keep my forked project as minimal as possible, so I would like to keep only the necessary packages/files that are needed to compile my simulation.
I'm talking specifically about the IDE Eclipse. If I decide to "backtrack" all imports starting from my simulation file, I would definitely get lost because the original project is big.
On the other side, if I decide to "delete" a package and see if my simulation compiles, I would stay all week trying this out, and if I delete a needed file I would have to attach it again to my project which is troublesome.
Is there an automatic tool I can use to do this on Eclipse?

A simple option is to use a coverage checker to see what methods in what classes are used during execution, and delete the rest. And then revert anything that causes a compilation error.
This only works for code, not resources, though - and only if something like reflection isn’t used.

A tool to detect broken JAR dependencies on class and method signature level

The problem scienario is as follows (Note: this is not a cross-jar dependency issue, so tools like JarAnalyzer, ClassDep or Tattletale would not help. Thanks).
I have a big project which is compiled into 10 or more jar artifacts. All jars depend on each other and form a dependency hierarchy.
Whenever I need to modify one of the jars, I would check out the relevant source code and the source code for projects that depend on it. Modify the code, compile, repackage the jars. So far so good.
The problem is: I may forget to check one of the dependent projects, because inter-jar dependencies can be quite long, and may change with time. If this happens some jars may go "out-of-sync" and I will eventually get a NoSuchMethodException or a some other class incompatibility issue at run-time, which is what I want to avoid.
The only solution I can think of, the most straighforward one, is to check out all projects, and recompile the bunch. But this takes time, especially if I re-build it every small change. I do have a continuous integration server, that could do this for me, but it's shared with other developers, so seeing if the build breaks is not an option for me.
However, I do have all the jars so hypothetically it should be possible to verify jars which depend on the code that I modified have an inconsistency in method signature, class names, etc. But how could I perform such check?
Has anyone faced a similar problem before? If so, how did you solve it? Any tools or methodologies would be appreciated.
Let me know if you need clarification. Thanks.
EDIT:
I would like to clarify my question a little bit.
The ultimate goal of this task is to check that the changes that I have made will compile against the whole project. I am looking for a tool/technique that would aid me perform such check.
Consider this example:
You have 2 projects: A and B which are deployed as A.jar and B.jar respectively. A depends on B.
You wish to modify B, so you check it out and modify a method signature that A happens to depend on. You can compile B and run all tests by itself without any problems because B itself does not depend on anything. So you happily commit your changes.
In a few hours the complete project integration fails because A could not be compiled!
How do I avoid this?
The kind of tool I am looking for would retrieve A.jar and check that all dependencies in A on the new modified B are still fine. Like a potential compilation error that would happen if I were to recompile A and B sources together.
Another solution, as was suggested by many of you, is to set up a local continuous integration system that would recompile the whole project locally. I don't mind doing this, but I want to avoid doing it inside my workspace. On the other hand, if I check-out all sources to another temporary workspace, then I need to mirror my local changes to the temporary workspace.
This is quite a big issue in my team, as builds break very often because somebody forgot to check out (or open in Eclipse) the right set of projects. I tried persuading people to check-out source and recompile the bunch before commits, but not only it takes time, it needs running quite a few commands so most people just find it too troublesome to do. If the technique is not easy or automated, then it's unusable.

If you do not want to use your shared continuous integration server you should set up a local one on your developer machine where you perform the rebuild processes on change.
I know Jenkins - it is easy to setup (just start) on a local machine and I would advice to run it locally if no one is provided in the IT infrastructure that fits your needs.

Checking signatures is unfortunately not enough. Having the correct signatures does not mean it'll work. It's all about contracts and not just signatures. I mean what happens if the new version of a library has the same method signature, but accepts an ArrayList parameter now in reversed order? You will run into issues - sooner or later. I guess you maybe consider implementing tools like Ivy or Maven:
http://ant.apache.org/ivy/
http://maven.apache.org/
Yes it can be pain to implement it but once you have it it will "guard" your versions forever. You should never run into such an issue. But even those build tools are not 100% accurate. The only proper way of dealing with incompatible libraries, I know you won't like my answer, is extensive regression testing. For this you need bunch of testing tools. There are plenty of them out there: from very basic unit testing (JUnit) to database testing (JDBC Proxy) and UI testing frameworks like SWTBot (depends if your app is a web app or thick client).
Please note if your project gets really huge and you have large amount of dependencies you always not using all of the code there. Trying to check all interfaces and all signatures is way too much. Its not necessary to test it all when your code use lets say 30 % of the library code. What you need is to test what you really use. And this can be only done with extensive regression testing.

I have finally found a whole treasure box of answers at this post. Thanks for help, everyone!
The bounty goes to K. Claszen for the quickest and most input.

I'm also think that just setup local Jenkins is a best idea. What tool you use for build? Maybe you can improve you situation with switching to Maven as build tool? In more smart and don't recompile full project if you don't ask it directly. But switch to in can be HUGE paint in the neck - it hardly depends on how you project organized now...
And about VCS- exist Mercurial/SVN bridge - so you can use local Mercurial for you development ....
check this link: https://www.mercurial-scm.org/wiki/WorkingWithSubversion

There is a solution, jarjar, which allows to have different versions of the same library to be included multiple times in the dependency graph.

I use IntelliJ, not Eclipse, so maybe my answer is too IDE-specific. But in IntelliJ, I would simply include the modules from B into A, so that when I make changes to A, it breaks B immediately when compiling in the IDE. Modules can belong to multiple projects, so this is not anything like duplication, it's just adding references in the IDE to modules in other projects.

Testing .jar library by "reference" rather than by contents

I need to use a third party .jar lib in my application: unfortunately the way this third party app is set up makes it well nigh on impossible to test on my windows box (since it's set up for a unix environment - I'll spare the details). So I have refactored it to be able to test it (altered the structure to use maven/spring so the handling of property files is more flexible, without changing the call out interface). If the new version compiles to the same .jar name/version/etc. I can presumably test it locally and then compile in the "real" .jar for production. Is this is daft idea? (I have a strong hunch that it is, since I am introducing a non-trivial dependency...). If so, how can I better test this library (without e.g. having to merge my own refactoring changes into the original code)?

Conceptually you have subdivided the third-party jar into two - the property bit and the rest. You replace the property bit with your own stuff and then procede to test the rest. On release you prepare a package including the bit you have tested and the original property stuff which you have not. In taking this approach, what risks have you introduced?
That the code you release will not be the code you tested - it's a different JAR, albeit with careful process not a very different JAR - so if you trust yourself to get it right this may be an acceptable risk.
That the code you didn't test is broken. This suggests that at least some testing is needed on the real platform.
That the tests on Windows give different results from tests on Unix. Again some degree of Unix testing is indicated.
I think that your appraoch is pragmatic and that the risks can be mitigated by having at least some tests be executable on Unix. I assume that your Windows-based testing is for ease of development/debugging. I would do this, but I would try to build a regression test suite that I can run on Unix - JUnit tests could run there.

Do you follow any guidelines (java) in packaging?

Do you follow any design guidelines in java packaging?
is proper packaging is part of the design skill? are there any document about it?
Edit : How packages has to depend on each other?, is cyclic packages unavoidable?, not about jar or war files.

My approach that I try to follow normally looks like this:
Have packages of reasonable size. Less then 3 classes is strange. Less then 10 is good. More then 30 is not acceptable. I'm normally not very strict about this.
Don't have dependency cycles between packages. This one is tough since many developers have a hard time figuring out any way to keep the dependencies cycle free. BUT doing so teases out a lot of hidden structure in the code. It becomes easier to think about the structure of the code and easier to evolve it.
Define layer and modules and how they are represented in the code. Often I end up with something like <domain>.<application>.<module>.<layer>.<arbitrary substructure as needed> as the template for package names
No cycles between layers; no cycles between modules.
In order to avoid cycles one has to have checks. Many tools do that (JDepend, Sonar ...). Unfortunatly they don't help much with finding ways to fix cycles. That's why I started to work on Degraph which should help with that by visualizing dependencies between classes, packages, modules and layer.

Packaging is normally about release management, and the general guidelines are:
consistency: when you are releasing into integration, pre-production or production environment several deliveries, you want them organized (or "packaged") exactly the same way
small number of files: when you have to copy a set of files from one environment to another, you want to copy as many as possible, if their number is reasonable (10-20 max per component to deliver), you can just copy them (even if those files are important in size)
So you want to define a common structure for each delivery like:
aDelivery/
lib // all jar, ear, war, ...
bin // all scripts used to launch your application: sh, bat, ant files, ...
config // all properties files, config files
src // all sources zipped into jars
docs // javadoc zipped
...
Plus, all those common directory structures should be stored into one common repository (a VCS, or a maven repo, or...), in order to be queried, without having to rebuilt them every time you need them (you do not need that if you have only one or two delivery components, but when you have 40 to 60 of them... a full rebuilt is out of the question).

You can find a lot of information here:
What strategy do you use for package naming in Java projects and why?

The problem with packaging in Java is that it has very little relation to what you would like to do. For example, I like following the Eclipse convention of having packages marked internal, but then I can't define their classes with a "package" protection level.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.