How does one weed out dependencies in a large project? - java

I'm about to inherit a rather large Java enterprise project that has a large amount of third party dependencies. There is at least seventy JARs included and some of them would seem to be unused e.g. spring.jar which I know isn't used.
It seems that over the years as various developers have touched upon the code base they have all tried out new project-of-the-month type libraries.
How does one go about getting rid of these? Within reason of course, as clearly some dependencies are helpful to not have to re-invent the wheel.
I'm obviously interested in java based projects but I'm welcome to answers across languages that people think will be helpful.

Personally, I think you have to start by assessing the scale of the problem. It's going to be fairly painful, but I'd make a list of the dependencies and work out exactly which parts of the project use which ones.
Then I'd work out exactly what features of each you're actually making use of (in many cases, you'll end up having a massive third party library which you're using a tiny part of).
Once you have this information, you'll at least know what you're dealing with.
My next step would be to look at all of the dependencies that you only use to a small extent. Checking around might uncover things that you could use from other libraries that would eliminate the lesser used libraries.
I'd also have a look around to see if there's anything small that you could just re-write and include in your own code-base.
Finally, I'd have a look around at the vendors of your dependencies and their competitors to see if the latest versions contain more functionality that will allow you to eliminate a few others.
Then you're just left wondering whether it's better to be highly dependent on a few vendors, or less dependent on a lot of vendors!! ;o)

structure101 http://www.headwaysoftware.com/products/structure101/index.php
It's a great tool for showing dependencies. I've been using it for a couple of years.

If you have a good set of automated tests, and you're looking to remove libraries which are not used at all, you could just use trial and error. One at a time, remove a library, and run your tests to see if everything still works. If not, put it back. Of course, if you can't even build without a library, you probably need it.
Basically, however you go about it, my idea is to remove them one at a time and see what breaks. If nothing breaks, odds are good you can just toss the library. If the problem is very minor (e.g. you need one method of one class in a large library), you might be able to code around it.
If you're dealing with a standalone application, you could give the JVM the -verbose:class option to see which classes are being loaded. This should give you messages like:
[Opened C:\Program Files\Java\jre1.6.0_04\lib\rt.jar]
[Loaded java.util.regex.Pattern$Single from C:\Program Files\Java\jre1.6.0_04\lib\rt.jar]

I read about an approach using instrumentation here, never tried it, but sounds reasonable.

We went through an exercise like this, on a delphi codebase. We dramatically simplified our external dependancies. Basically, we went about it like this:
Catalogued all external libraries and components
Catalogued (using a file search tool) where they were used, and what for.
Removed everything we didn't use or didn't need (some libraries were used in code that was no longer needed).
Made a ranking of which libraries we favored, basing this on whether the library was actively developed, how much functionality it offered that we used, how difficult it was to port the code that used it to another library that we already used and so on.
Finally, we iteratively removed dependancies on libraries low on the list by porting that functionality to another library.
This was, however, quite a lot of work.

If you take the approach of "remove things until it won't compile" you need to be very careful about transitive runtime dependencies. If there's a good quality test suite, it can help, but you'll certainly need to run a test coverage tool like Cobertura to make sure that enough of the code is getting tested to exercise your full dependency graph.
How much code are you talking about? The review-based approach suggested by Joeri frankly seems the best to me; it has the added advantage of making you at least superficially familiar with all parts of the system. If you're just inheriting a big project, this is something you should probably take the time to do anyway.

if you have a full regression test suite for this project, all you have to do is run the regression suite while running with 1 less JAR each time in a loop. it is NOT fast BUT it is easy to do.

Related

Is there any need to switch to modules when migrating to Java 9 or later?

We're currently migrating from Java 8 to Java 11. However, upgrading our services was less painful, than we anticipated. We basically only had to change the version number in our build.gradle file and the services were happily up and running. We upgraded libraries as well as (micro) services that use those libs. No problems until now.
Is there any need to actually switch to modules? This would generate needless costs IMHO. Any suggestion or further reading material is appreciated.
To clarify, are there any consequences if Java 9+ code is used without introducing modules? E.g. can it become incompatible with other code?
No.
There is no need to switch to modules.
There has never been a need to switch to modules.
Java 9 and later releases support traditional JAR files on the
traditional class path, via the concept of the unnamed module, and will
likely do so until the heat death of the universe.
Whether to start using modules is entirely up to you.
If you maintain a large legacy project that isn’t changing very much,
then it’s probably not worth the effort.
If you work on a large project that’s grown difficult to maintain over
the years then the clarity and discipline that modularization brings
could be beneficial, but it could also be a lot of work, so think
carefully before you begin.
If you’re starting a new project then I highly recommend starting with
modules if you can. Many popular libraries have, by now, been upgraded
to be modules, so there’s a good
chance that all of the dependencies that you need are already available
in modular form.
If you maintain a library then I strongly recommend that you
upgrade it to be a module if you haven’t done so already, and if all of
your library’s dependencies have been converted.
All this isn’t to say that you won’t encounter a few stumbling blocks
when moving past Java 8. Those that you do encounter will, however,
likely have nothing to do with modules per se. The most common
migration problems that we’ve heard about since we released Java 9 in
2017 have to do with changes to the syntax of the version
string and to the removal or
encapsulation of internal APIs
(e.g., sun.misc.Base64Decoder) for which public, supported
replacements have been available for years.
I can only tell you my organization opinion on the matter. We are in the process of moving to modules, for every single project that we are working on. What we are building is basically micro-services + some client libraries. For micro-services the transition to modules is somehow a lower priority: the code there is already somehow isolated in the docker container, so "adding" modules in there does not seem (to us) very important. This work is being picked up slowly, but it's low priority.
On the other hand, client libraries is an entirely different story. I can not tell you the mess we have sometimes. I'll explain one point that I hated before jigsaw. You expose an interface to clients, for everyone to use. Automatically that interface is public - exposed to the world. Usually, what I do, is have then some package-private classes, that are not exposed to the clients, that use that interface. I don't want clients to use that, it is internal. Sounds good? Wrong.
The first problem is that when those package-private classes grow, and you want more classes, the only way to keep everything hidden is to create classes in the same package:
package abc:
-- /* non-public */ Usage.java
-- /* non-public */ HelperUsage.java
-- /* non-public */ FactoryUsage.java
....
When it grows (in our cases it does), those packages are way too big. Moving to a separate package you say? Sure, but then that HelperUsage and FactoryUsage will be public and we tried to avoid that from the beginning.
Problem number two: any user/caller of our clients can create the same package name and extend those hidden classes. It happened a few times to us already, fun times.
modules solves this problem in a beautiful way : public is not really public anymore; I can have friend access via exports to directive. This makes our code lifecycle and management much easier. And we get away from classpath hell. Of course maven/gradle handle that for us, mainly, but when there is a problem, the pain will be very real. There could be many other examples, too.
That said, transition is (still) not easy. First of all, everyone on the team needs to be aligned; second there are hurdles. The biggest two I still see is: how do you separate each module, based on what, specifically? I don't have a definite answer, yet. The second is split-packages, oh the beautiful "same class is exported by different modules". If this happens with your libraries, there are ways to mitigate; but if these are external libraries... not that easy.
If you depend on jarA and jarB (separate modules), but they both export abc.def.Util, you are in for a surprise. There are ways to solve this, though. Somehow painful, but solvable.
Overall, since we migrated to modules (and still do), our code has become much cleaner. And if your company is "code-first" company, this matters. On the other hand, I have been involved in companies were this was seen as "too expensive", "no real benefit" by senior architects.

Two java libraries importing each other?

I am working on a legacy framework and apparently there are two libraries, which are inter-dependent. By that I mean libA import from libB, and libB import from libA. First i think it is a terrible design, but why would somebody do something like this? Rather which conditions can lead somebody to write this ?
edit:
Each library depends on classes in the other, so they do import packages and have the other library jar in their build path.
It's easier to do in this case, because the two parties are independent. If they don't talk to each other, it's not hard to create cycles. You have to be mindful to avoid them.
Cyclic dependencies aren't hard to create. Look at Java itself: java.lang, java.util, and java.io have cycles. Will you stop writing Java, since it's so "terrible"?
It means that you can never use libA without libB and vice versa. They've become one big library. Same with packages in Java and other systems: once you have a cycle, you have to use all those packages together as if they were one.
The guys who write Spring pay a lot of attention to cycles. They design and refactor their framework to eliminate them.
So - what's the harm? Juergen Heller says they're bad, and he's right. But from your point of view, what evil is visited upon you? It means you have to use both when you run and test. You can't test class A without class B and vice versa when there's a cycle between them. It makes testing and running harder.
You can choose an alternative that doesn't have the cycle. If you can change the source, you can refactor and maintain it. But that's it.
You should check your own code to see if you've done it to yourself. IntelliJ has nice analysis tools which can be applied to a code base. Check it out.
While developing lib A, the developer found that the class Foo from lib B was useful. And while developing lib B, the developer found that the class Bar from lib A was useful.
I'm not saying it's a wise thing to do, but your question asks why anybody would do that. This is probably the answer.
Both the libraries were written at a the same time, possibly by different developers. Or the same developer at different times, or by a developer who treated both libraries as one big code base, and wasn't concerned about avoiding circular dependencies. e.g. They had a hard enough time writing something which worked without worrying about niceties.
Most likely it will be inexperience.
Modern build tools like Maven preclude circular dependencies between artifacts.

Understanding and modifying large projects

I am a novice programmer and as a part of my project I have to modify a open source tool (written in java) which has hundreds of classes. I have to modify a significant part of it to suit the needs of the project. I have been struggling with it for the last one month trying to read code, trying to find out the functionalities of each class and trying to figure out the pipeline from start to end.
80% of the classes have incomplete/missing documentation. The remaining 20% are those that form the general purpose API for the tool.
One month of code reading has just helped me understand the basic architecture. But I have not been able to figure out the exact changes I need to make for my project. One time, I started modifying a part of the code and soon made so many changes that I could no longer remember.
A friend suggested that I try to write down the class hierarchy. Is there a better(standard?) way to do this?
check in the code in some source code repository (Subversion, CVS, Git, Mercurial...)
make sure that you can build the project from the source and run it
if you already have an application that uses this open source tool try removing the binary dependency and introduce project dependency in eclipse or any other IDE. run your code and step through the code that you want to understand
after every small change commit
if you have different ideas branch the code
There's a great book called Working Effectively with Legacy Code, by Michael Feathers. There's a shorter article version here.
One of his points is that the best thing you can do is write unit tests for the existing code. This helps you understand where the entry points are and how the code should work. Then it lets you refactor it without worrying that you're going to break it.
From the article linked, the summary of his strategy:
1. Identify change points
2. Find an inflection point
3. Cover the inflection point
a. Break external dependencies
b. Break internal dependencies
c. Write tests
4. Make changes
5. Refactor the covered code.
Two things that Eclipse (and other IDEs as well) offer to 'fight' this. I've used them on very large projects:
Call hierarchy - right-click a method and choose "call hierarchy", or use CTRL + ALT + H. This gives you all methods that call the selected method, with option to check further down the tree. This feature is really very useful.
Type hierarchy - see the inheritance hierarchy of classes. In eclipse it's F4 or CTRL + T.
Also:
find a way to make so that changes take effect on-save, and you don't have to redeploy
use a debugger - run in debug mode, within the IDE, so that you see how the flow proceeds
My friend, you are in deep doodoo. Modifying large, badly documented legacy code is one of those projects that makes experienced programmers seriously contemplate the joys of selling insurance, or some other alternative career. However it isn't impossible, and here are some tips that I hope will help.
Your first task is to understand the code as much as possible. You are at least on the right track there. Getting a good idea of the class structure is absolutely important, and a diagram is probably the best way. The other thing I would suggest is that when you find out what a class does, add the missing documentation yourself. That way when you come back to it you wont' have forgotten what you found out.
Don't forget the debugger. If you want to find out what is really going on, stepping through the relevant code, or simply finding out what a call stack really looks like at a certain point can be very helpful.
The only way to understand code is to read it. Keep working that is my advice.
There are projects with better documentation than others. Here is a couple of projects that I know are well organized:
Tomcat ,
Jetty,
Hudson,
You should check java-source for more open source projects.
Personally I think it is very difficult to try to understand an entire application all at once. Instead, try to focus only on certain modules. For example, if you can identify a module that you need to change (e.g. based on a screen, or certain input/output point), then start by making one small change and testing it. Go from there, making a small change, testing, and moving on.
Additionally, if your project has unit tests (consider yourself lucky) and review the unit tests of the module you are focusing on. That will help you get an idea of what the module is expected to do.
In my opinion there is no standard approach to understand a project. It depends on many factors, from the understandability of the code/architecture you're analyzing to your previous experience on large projects.
I suggest you to reverse-engineer the code by using a modeling tool, so that you can generate some UML models from the existing source code. These diagrams can be helpful as a graphic guideline during your anaysis of the code.
Don't be afraid to use debugging to grab the logic of the most complex functionalities of the project. Running the most complex code instruction by instruction, seeing the exact values of the variables and the interactions between the objects can be helpful.
Before you refactor to change the project to suit your needs, be sure to write some test cases, so that you can verify that your modifications don't break the code in unexpected ways.
Here are a couple recommendations
Get the code into some form of CVS.
This way if you start making changes
you can always look back at previous
versions.
Take the time to document what you
have already learned/gone through. Javadoc is fine
for this.
Create a UML structure for you code.
There are lots of plugins out there and wil give you a nice representation of your code layout.

Important things to keep it mind before a Code Review in Java

I have just created a mid-sized web-application using Java, a custom MVC framework, javascript. My code will be reviewed before it's put in the productions servers (internal use).
The primary objective of building this app was to solve a small problem for internal use and understand the custom made MVC framework used by my employer. So, my app has gone through MANY iterations, feature changes and additions.
So, bottom line, the code is very very dirty and this is my first "product level" Java app.
What are your suggestions, what are some basic checks/refractoring I should do before the code review?
I am thinking about:
Java best practices (conventions)
Make the code simple to understand for the developer who will maintain it. (won't be me)
I noticed, I have created some unnecessary objects and used hashmaps/arraylists where could have easily used some other Data structure and achieved the solution. So, is that worth changing?
Update
Your Code Sucks and I Hate You: The Social Dynamics of Code Reviews
If you did not already, (assuming you use an IDE like eclipse)
get plugins checkstyle and findbugs
go through their configuration and tune to your style
run them on your code
resolve all issues reported
you can also tune the compiler warning setting of eclipse itself and possibly make them more strict in what is reported.
Look at code structure:
get plugin jdepend
investigate your package structure
Code against interfaces (Map, List, Set) instead of implementation classes (HashMap, ArrayList, TreeSet)
Complete your Javadoc and make check it is up to date after all refactorings.
Add JUnit tests; if you have no time left to test the whole application, at least create a test for every bug you find and solve from now on. This helps "growing" a test set as you go.
Next time design and build your application with the end goal in sight. Always assume that the next guy having to maintain your code will know how to find you :-)
Unit tests, and they should be automated as part of your build. You should already have these, but if not, do it now. It will definitely make the refactoring easier, as well improving your general confidence in the code (and the guy who will be maintaining it).
Logging.
One of the more overlooked things is the importance of logging. You need to have a decent logging methodology put in place. Even though this is an internal app, make sure that the basic logs can help regular users find issues and provide more detailed logging so that you (the developer) would know where to go.
Comment your code, explain why it's doing what it's doing and what assumptions have been made.
Try to reduce the amount of mutating state.
Try to remove any singletons you may have.

Can OSGi help reduce complexity?

I saw lots of presentations on OSGi and i think it sounds promising for enforcing better modularization. Apparently "hotdeployment" and "running different versions of x in parallel" are mayor selling points too.
I wonder whether what OSGi promises to solve is even an issue...? It reminded me of the early days of OO when similar claims were maid:
When OO was new, the big argument was reusability. It was widely claimed that when using OO, one would only have to "write once" and could then "use everywhere".
In practice I only saw this working for some pretty low level examples. I think the reason for this is that writing reusable code is hard. Not technically but from a interface design point of view. You have to anticipate how future clients will want to use your classes and take the right choices up front. This is difficult by definition and thus the potential reusability benefit often failed to deliver.
With OSGi, I have the suspicion that here again we could fall for promises, potential solutions for problems that we don't really have. Or if we have them, we don't have them in a big enough quantity and severity that would justify to buy into OSGi for help. "Hotdeployment" for example of a subset of modules is definitely a great idea, but how often does it really work? How often not because it turned out you got the modularization wrong for the particular issue? How about model entities that are shared between multiple modules? Do these modules all have to be changed at the same time? Or do you flatten your objects to primitives and use only those in inter-module communication, in order to be able to keep interface contracts?
The hardest problem when applying OSGi is, I would presume, to get the modularization "right". Similar to getting the interfaces of your classes right in OO, with OSGi, the problem stays the same, on a bigger scale this time, the package or even service level.
As you might have guessed, I'm currently trying to evaluate OSGi for use in a project. The major problem we have, is increasing complexity as the codebase grows and I would like to break the system up in smaller modules that have less and more defined interactions.
Given no framework can ever help deciding what to modularize, has OSGi ever payed off for you?
Has it made your life easier when working in teams?
Has it helped to reduce bug count?
Do you ever successfully "hotdeploy" major components?
Does OSGi help to reduce complexity over time?
Did OSGi keep its promises?
Did it fulfill your expectations?
Thanks!
OSGi pays off because it enforces modularization at runtime, something you previously did not have, often causing the design on paper and implementation to drift apart. This can be a big win during development.
It definitely helps make it easier to work in teams, if you let teams focus on a single module (possibly a set of bundles), and if you get your modularization right. One could argue that one can do the same thing with a build tool like Ant+Ivy or Maven and dependencies, the granularity of dependencies that OSGi uses is in my opinion far superior, not causing the typical "dragging in everything plus the kitchen sink" that JAR level dependencies cause.
Modular code with less dependencies tends to lead to cleaner and less code, in turn leading to less bugs that are easier to test for and solve. It also promotes designing components as simple and straightforward as possible, whilst at the same time having the option to plug in more complicated implementations, or adding aspects such as caching as separate components.
Hot deployment, even if you do not use it at runtime, is a very good test to validate if you modularized your application correctly. If you cannot start your bundles in a random order at all, you should investigate why. Also, it can make your development cycle a lot quicker if you can update an arbitrary bundle.
As long as you can manage your modules and dependencies, big projects stay manageable and can be easily evolved (saving you from the arguably bad "complete rewrite").
The downside of OSGi? It's a very low-level framework, and whilst it solves the problems it is intended for quite well, there are things that you still need to resolve yourself. Especially if you come from a Java EE environment, where you get free thread-safety and some other concepts that can be quite useful if you need them, you need to come up with solutions for these in OSGi yourself.
A common pitfall is to not use abstractions on top of OSGi to make this easier for the average developer. Never ever let them mess with ServiceListeners or ServiceTrackers manually. Carefully consider what bundles are and are not allowed to do: Are you willing to give developers access to the BundleContext or do you hide all of this from them by using some form of declarative model.
I've worked with OSGi for some years now (although in the context of an eclipse project, not in a web project). It is clear that the framework does not free you from thinking how to modularize. But it enables you to define the rules.
If you use packages and defines (In a design document? Verbal?) that certain packages may not access classes in other packages, without an enforcement of this constraint, it will be broken. If you hire new developers they don't know the rules. They WILL break the rules. With OSGi you can define the rules in code. For us this was a big win, as it has helped us to maintain the architecture of our system.
OSGi does not reduce complexity. But it definitely helps to handle it.
I am using OSGI for over 8 years now, and every time I dive in a non-OSGI project I get the feeling over overspeeding without a seatbelt on.
OSGI makes project setup and deployment harder, and forces you to think about modularization upfront, but gives you the easy of mind of enforcing the rules at runtime.
Take maven apache camel as an example. When you create a new maven project and add apache camel as a dependency, the applications seems to have all its dependencies, and you will only notice the ClassNotFoundExceptions at runtime, which is bad. When you run in an OSGI container and load the apache camel modules, the modules with unmet dependencies are not started, and you know upfront what the problem is.
I also use the hot-deployment all the time, and update parts of my application on the fly without the need for a restart.
I used OSGI in one project (I admit - not very much). It provides good promises, but as #Arne said, you still need to think on your own about how you modularize.
OSGI did help our project because it made the architecture more stable. Breaking the modularization is more "difficult", so decisions that we made regarding how to modularize stayed valid for a longer time.
To put it differently - without OSGI, and under time pressure to deliver, sometimes you or your team members make compromises, shortcuts and other hacks, and the the original intent of the architecture is lost.
So OSGI didn't reduce the complexity per se, but it protected it from growing unnecessarily over time. I guess that is a good thing :)
I haven't used the hot deploy feature, so I can't comment about that.
To answer your last point, it did meet my expectations, but it required a learning curve and some adaption, and the payoff is only for long-term.
(as a side note, your question reminds me a bit of the adage that "maven is the awt of build systems")
OSGi does NOT pay off. The fact is OSGi is not easy to use and at the end of the day or year depending on how long it takes you to get things working, it does not add value:
Your application will not be more modular overall, on the contrary, It ends being more exposed and not isolated from other applications since it is a share everything instead of share nothing arch.
Versioning is pushed further down the stack, you wrestle with maven transitive dependencies only to do that again at runtime in OSGI.
Most libraries are designed to work as libraries in the application classloader not as bundles with their own classloader.
Maybe appropriate for plugin architectures where third party developers need to be sandboxed or maybe it is just EJB2.0 all over again.
I added the following slides and I will follow up with example code to demonstrate how to work successfully with OSGi if it is forced on you.
http://www.slideshare.net/ielian/tdd-on-osgi
No, OSGI will make you grey early.

Categories

Resources