How to understand Open Source projects/libraries?

How to understand Open Source projects/libraries? - java

There are few open source projects/APIs/libraries that we use in our project (Spring, Struts, iBatis etc.) and I want to understand their design and how they work internally.
What is the best way to understand these projects? Note that I am already using these libraries in my project. And I know the input-output interaction/configurations for these libraries. What I don't understand is how these APIs/libraries work internally.
The problems I face is:
Finding the entry class of the library. Is there any way I can know the entry class for the library - something which is kicking the whole API?
Tools/Plugins to use in Eclipse to get an overview of the design of the library. Going through each and every class of the library, can be a very daunting task. Is there any tool you would like to recommend which can generate the class diagrams of the API in Eclipse.
Thanks in advance!!
UPDATE: I need some inputs on eclipse plugins which can help me in getting an overview/class diagram of the library

I always use the same strategy for this: I never try to "understand" the code base as a whole, and I usually try to follow the request flow. I read enough of the documentation to determine what is necessary to use the application, and I read that code (Keep all source code loaded in your IDE).
For example, in struts you'll be installing a servlet filter in web.xml. Start reading the filter and follow the path a single request takes through your stack.
Likewise for spring, there are two main entry points, the filter and "getBean", both of which are mentioned real early in the documentation. Read those two.
For both of these cases you'll find one or two classes that represent the "core" of the framework real quickly. Read those really well and let actual use cases & needs drive your further exploration.
Approaching "understanding" of an open source library (or any other code base for that matter) by trying to find all the pieces is usually not a very good way of approaching these things, it will usually just lead nowhere because a lot of these things contain too much code. When following the request flow I find making diagrams can also be quite distracting, it tends to draw attention/focus away from understanding (and since my understanding increases rapidly most of them are out-of-date even before they reach the printer).

Nice question!!!, what I've done, specially in the case of Spring, apart from consulting the Documentation and their API's is to attach the sources of the project to my project on Eclipse, that way I'm able to navigate through the source code, not just the API. Its been quite helpful specially in the case of the Spring-Security project, there were some concepts that I just couldn't understand until I inspected the source code.
That's one of the advantages of using Open Source libraries.
Regards.

Tools like Structure101 (http://www.headwaysoftware.com/products/structure101/index.php), and Lattix (http://www.lattix.com/) let you analyze code and produce architecture diagrams / dependency matrices.
This is not exactly class diagram - the main focus is on layering. So the entry point is usually the topmost layer.
But then again, as I specified above, you will notice that some libs are just a mess, and these tools will not be helpful enough.
See the S101 online demo: http://www.structure101.com/java/
This for example is the Sonar project architecture: http://www.structure101.com/java/tracker/sonar/1.11.1/arch.html

Your best bet for those three would be to consult the official documentation (make sure you are looking at the version you are using) or to get a book on the technology.

Most APIs don't have a class with a main method; they're running in the webserver called by the server itself. Unless they're running as their own server, they won't have a main method.

Related

Using Apache Ant APIs in Java program to programmatically build source files

I am looking for good and practical resources that will help me use the Ant APIs effectively. The project website just gives the documentation of the API which is not useful at all. Very few websites seem to give very brief tutorials on the subject.
Is there some resource I am missing out on? How can I use the Ant APIs for simple tasks, without spending hours browsing through them and looking at source code?
Thanks.
(Answers to previously asked questions not helpful - How can i use Apache ANT Programmatically )

As it turns out, the lack of good resources on using the Ant API, is known and intended.
The bottom paragraph of this article from the Ant says -
The question you are probably asking yourself at this point is: How would I know which classes and methods have to be called in order to set up a dummy Project and Target? The answer is: you don't. Ultimately, you have to be willing to get your feet wet and read the source code. The above example is merely designed to whet your appetite and get you started. Go for it!
So this seems to be the only way to make best use of the API.

..Java program to programmatically build source files
If compiling/Jarring is all you need and you can run it in an SDK (as opposed to a plain JRE), look to the JavaCompiler class for compilation. Then use the Jar related classes to build the Jars.
All J2SE. Ant not included, Ant not required.

There is no better manual to understand ANT than : http://ant.apache.org/manual/index.html
I am not sure if you've gone through this link that explains in detail about creating a task. A word of caution, If you're new to ANT, there is no easy way to jump on this tutorial. Better to learn the basics before you come here. Refer above link for a good starting point.

Understanding and modifying large projects

I am a novice programmer and as a part of my project I have to modify a open source tool (written in java) which has hundreds of classes. I have to modify a significant part of it to suit the needs of the project. I have been struggling with it for the last one month trying to read code, trying to find out the functionalities of each class and trying to figure out the pipeline from start to end.
80% of the classes have incomplete/missing documentation. The remaining 20% are those that form the general purpose API for the tool.
One month of code reading has just helped me understand the basic architecture. But I have not been able to figure out the exact changes I need to make for my project. One time, I started modifying a part of the code and soon made so many changes that I could no longer remember.
A friend suggested that I try to write down the class hierarchy. Is there a better(standard?) way to do this?

check in the code in some source code repository (Subversion, CVS, Git, Mercurial...)
make sure that you can build the project from the source and run it
if you already have an application that uses this open source tool try removing the binary dependency and introduce project dependency in eclipse or any other IDE. run your code and step through the code that you want to understand
after every small change commit
if you have different ideas branch the code

There's a great book called Working Effectively with Legacy Code, by Michael Feathers. There's a shorter article version here.
One of his points is that the best thing you can do is write unit tests for the existing code. This helps you understand where the entry points are and how the code should work. Then it lets you refactor it without worrying that you're going to break it.
From the article linked, the summary of his strategy:
1. Identify change points
2. Find an inflection point
3. Cover the inflection point
a. Break external dependencies
b. Break internal dependencies
c. Write tests
4. Make changes
5. Refactor the covered code.

Two things that Eclipse (and other IDEs as well) offer to 'fight' this. I've used them on very large projects:
Call hierarchy - right-click a method and choose "call hierarchy", or use CTRL + ALT + H. This gives you all methods that call the selected method, with option to check further down the tree. This feature is really very useful.
Type hierarchy - see the inheritance hierarchy of classes. In eclipse it's F4 or CTRL + T.
Also:
find a way to make so that changes take effect on-save, and you don't have to redeploy
use a debugger - run in debug mode, within the IDE, so that you see how the flow proceeds

My friend, you are in deep doodoo. Modifying large, badly documented legacy code is one of those projects that makes experienced programmers seriously contemplate the joys of selling insurance, or some other alternative career. However it isn't impossible, and here are some tips that I hope will help.
Your first task is to understand the code as much as possible. You are at least on the right track there. Getting a good idea of the class structure is absolutely important, and a diagram is probably the best way. The other thing I would suggest is that when you find out what a class does, add the missing documentation yourself. That way when you come back to it you wont' have forgotten what you found out.
Don't forget the debugger. If you want to find out what is really going on, stepping through the relevant code, or simply finding out what a call stack really looks like at a certain point can be very helpful.

The only way to understand code is to read it. Keep working that is my advice.
There are projects with better documentation than others. Here is a couple of projects that I know are well organized:
Tomcat ,
Jetty,
Hudson,
You should check java-source for more open source projects.

Personally I think it is very difficult to try to understand an entire application all at once. Instead, try to focus only on certain modules. For example, if you can identify a module that you need to change (e.g. based on a screen, or certain input/output point), then start by making one small change and testing it. Go from there, making a small change, testing, and moving on.
Additionally, if your project has unit tests (consider yourself lucky) and review the unit tests of the module you are focusing on. That will help you get an idea of what the module is expected to do.

In my opinion there is no standard approach to understand a project. It depends on many factors, from the understandability of the code/architecture you're analyzing to your previous experience on large projects.
I suggest you to reverse-engineer the code by using a modeling tool, so that you can generate some UML models from the existing source code. These diagrams can be helpful as a graphic guideline during your anaysis of the code.
Don't be afraid to use debugging to grab the logic of the most complex functionalities of the project. Running the most complex code instruction by instruction, seeing the exact values of the variables and the interactions between the objects can be helpful.
Before you refactor to change the project to suit your needs, be sure to write some test cases, so that you can verify that your modifications don't break the code in unexpected ways.

Here are a couple recommendations
Get the code into some form of CVS.
This way if you start making changes
you can always look back at previous
versions.
Take the time to document what you
have already learned/gone through. Javadoc is fine
for this.
Create a UML structure for you code.
There are lots of plugins out there and wil give you a nice representation of your code layout.

Important things to keep it mind before a Code Review in Java

I have just created a mid-sized web-application using Java, a custom MVC framework, javascript. My code will be reviewed before it's put in the productions servers (internal use).
The primary objective of building this app was to solve a small problem for internal use and understand the custom made MVC framework used by my employer. So, my app has gone through MANY iterations, feature changes and additions.
So, bottom line, the code is very very dirty and this is my first "product level" Java app.
What are your suggestions, what are some basic checks/refractoring I should do before the code review?
I am thinking about:
Java best practices (conventions)
Make the code simple to understand for the developer who will maintain it. (won't be me)
I noticed, I have created some unnecessary objects and used hashmaps/arraylists where could have easily used some other Data structure and achieved the solution. So, is that worth changing?
Update
Your Code Sucks and I Hate You: The Social Dynamics of Code Reviews

If you did not already, (assuming you use an IDE like eclipse)
get plugins checkstyle and findbugs
go through their configuration and tune to your style
run them on your code
resolve all issues reported
you can also tune the compiler warning setting of eclipse itself and possibly make them more strict in what is reported.
Look at code structure:
get plugin jdepend
investigate your package structure
Code against interfaces (Map, List, Set) instead of implementation classes (HashMap, ArrayList, TreeSet)
Complete your Javadoc and make check it is up to date after all refactorings.
Add JUnit tests; if you have no time left to test the whole application, at least create a test for every bug you find and solve from now on. This helps "growing" a test set as you go.
Next time design and build your application with the end goal in sight. Always assume that the next guy having to maintain your code will know how to find you :-)

Unit tests, and they should be automated as part of your build. You should already have these, but if not, do it now. It will definitely make the refactoring easier, as well improving your general confidence in the code (and the guy who will be maintaining it).

Logging.
One of the more overlooked things is the importance of logging. You need to have a decent logging methodology put in place. Even though this is an internal app, make sure that the basic logs can help regular users find issues and provide more detailed logging so that you (the developer) would know where to go.

Comment your code, explain why it's doing what it's doing and what assumptions have been made.
Try to reduce the amount of mutating state.
Try to remove any singletons you may have.

How does one weed out dependencies in a large project?

I'm about to inherit a rather large Java enterprise project that has a large amount of third party dependencies. There is at least seventy JARs included and some of them would seem to be unused e.g. spring.jar which I know isn't used.
It seems that over the years as various developers have touched upon the code base they have all tried out new project-of-the-month type libraries.
How does one go about getting rid of these? Within reason of course, as clearly some dependencies are helpful to not have to re-invent the wheel.
I'm obviously interested in java based projects but I'm welcome to answers across languages that people think will be helpful.

Personally, I think you have to start by assessing the scale of the problem. It's going to be fairly painful, but I'd make a list of the dependencies and work out exactly which parts of the project use which ones.
Then I'd work out exactly what features of each you're actually making use of (in many cases, you'll end up having a massive third party library which you're using a tiny part of).
Once you have this information, you'll at least know what you're dealing with.
My next step would be to look at all of the dependencies that you only use to a small extent. Checking around might uncover things that you could use from other libraries that would eliminate the lesser used libraries.
I'd also have a look around to see if there's anything small that you could just re-write and include in your own code-base.
Finally, I'd have a look around at the vendors of your dependencies and their competitors to see if the latest versions contain more functionality that will allow you to eliminate a few others.
Then you're just left wondering whether it's better to be highly dependent on a few vendors, or less dependent on a lot of vendors!! ;o)

structure101 http://www.headwaysoftware.com/products/structure101/index.php
It's a great tool for showing dependencies. I've been using it for a couple of years.

If you have a good set of automated tests, and you're looking to remove libraries which are not used at all, you could just use trial and error. One at a time, remove a library, and run your tests to see if everything still works. If not, put it back. Of course, if you can't even build without a library, you probably need it.
Basically, however you go about it, my idea is to remove them one at a time and see what breaks. If nothing breaks, odds are good you can just toss the library. If the problem is very minor (e.g. you need one method of one class in a large library), you might be able to code around it.
If you're dealing with a standalone application, you could give the JVM the -verbose:class option to see which classes are being loaded. This should give you messages like:
[Opened C:\Program Files\Java\jre1.6.0_04\lib\rt.jar]
[Loaded java.util.regex.Pattern$Single from C:\Program Files\Java\jre1.6.0_04\lib\rt.jar]

I read about an approach using instrumentation here, never tried it, but sounds reasonable.

We went through an exercise like this, on a delphi codebase. We dramatically simplified our external dependancies. Basically, we went about it like this:
Catalogued all external libraries and components
Catalogued (using a file search tool) where they were used, and what for.
Removed everything we didn't use or didn't need (some libraries were used in code that was no longer needed).
Made a ranking of which libraries we favored, basing this on whether the library was actively developed, how much functionality it offered that we used, how difficult it was to port the code that used it to another library that we already used and so on.
Finally, we iteratively removed dependancies on libraries low on the list by porting that functionality to another library.
This was, however, quite a lot of work.

If you take the approach of "remove things until it won't compile" you need to be very careful about transitive runtime dependencies. If there's a good quality test suite, it can help, but you'll certainly need to run a test coverage tool like Cobertura to make sure that enough of the code is getting tested to exercise your full dependency graph.
How much code are you talking about? The review-based approach suggested by Joeri frankly seems the best to me; it has the added advantage of making you at least superficially familiar with all parts of the system. If you're just inheriting a big project, this is something you should probably take the time to do anyway.

if you have a full regression test suite for this project, all you have to do is run the regression suite while running with 1 less JAR each time in a loop. it is NOT fast BUT it is easy to do.

How should I start when developing a system based on modules or plugins?

I intend to develop a system that is entirely based on modules. The system base should have support for finding out about plugins, starting them up and being able to provide ways for those modules to communicate. Ideally, one should be able to put in new modules and yank out unused modules at will, and modules should be able to use each other's funcionality if it is available.
This system should be used as a basis for simulation systems where a lot of stuff happens in different modules, and other modules might want to do something based on that.
The system I intend to develop is going to be in Java. The way I see it, I intend to have a folder with a subfolder for each module that includes a XML that describes the module with information such as name, maybe which events it might raise, stuff like that. I suppose I might need to write a custom ClassLoader to work this stuff out.
The thing is, I don't know if my idea actually holds any water and, of course, I intend on building a working prototype. However, I never worked on a truly modular system before, and I'm not really sure what is the best way to take on this problem.
Where should I start? Are there common problems and pitfalls that are found while developing this kind of system? How do I make the modules talk with each other while maintaining isolation (i.e, you remove a module and another module that was using it stays sane)? Are there any guides, specifications or articles I can read that could give me some ideas on where to start? It would be better if they were based on Java, but this is not a requirement, as what I'm looking for right now are ideas, not code.
Any feedback is appreciated.

You should definitely look at OSGi. It aims at being the component/plugin mechanism for Java. It allows you to modularize your code (in so-called bundles) and update bundles at runtime. You can also completely hide implementation packages from unwanted access by other bundles, eg. only provide the API.
Eclipse was the first major open-source project to implement and use OSGi, but they didn't fully leverage it (no plugin installations/updates without restarts). If you start from scratch though, it will give you a very good framework for a plugin system.
Apache Felix is a complete open-source implementation (and there are others, such as Eclipse Equinox).

Without getting into great detail, you should be looking at Spring and a familiarization with OSGI or the Eclipse RCP frameworks will also give you some fundamental concepts you will need to keep in mind.

Another option is the ServiceLoader added in Java 1.6.

They are many way to do it but something simple can be by using Reflection. You write in your XML file name of file (that would be a class in reallity). You can than check what type is it and create it back with reflection. The class could have a common Interface that will let you find if the external file/class is really one of your module. Here is some information about Reflexion.
You can also use a precoded framework like this SourceForge onelink text that will give you a first good step to create module/plugin.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.