Understanding and modifying large projects - java

I am a novice programmer and as a part of my project I have to modify a open source tool (written in java) which has hundreds of classes. I have to modify a significant part of it to suit the needs of the project. I have been struggling with it for the last one month trying to read code, trying to find out the functionalities of each class and trying to figure out the pipeline from start to end.
80% of the classes have incomplete/missing documentation. The remaining 20% are those that form the general purpose API for the tool.
One month of code reading has just helped me understand the basic architecture. But I have not been able to figure out the exact changes I need to make for my project. One time, I started modifying a part of the code and soon made so many changes that I could no longer remember.
A friend suggested that I try to write down the class hierarchy. Is there a better(standard?) way to do this?

check in the code in some source code repository (Subversion, CVS, Git, Mercurial...)
make sure that you can build the project from the source and run it
if you already have an application that uses this open source tool try removing the binary dependency and introduce project dependency in eclipse or any other IDE. run your code and step through the code that you want to understand
after every small change commit
if you have different ideas branch the code

There's a great book called Working Effectively with Legacy Code, by Michael Feathers. There's a shorter article version here.
One of his points is that the best thing you can do is write unit tests for the existing code. This helps you understand where the entry points are and how the code should work. Then it lets you refactor it without worrying that you're going to break it.
From the article linked, the summary of his strategy:
1. Identify change points
2. Find an inflection point
3. Cover the inflection point
a. Break external dependencies
b. Break internal dependencies
c. Write tests
4. Make changes
5. Refactor the covered code.

Two things that Eclipse (and other IDEs as well) offer to 'fight' this. I've used them on very large projects:
Call hierarchy - right-click a method and choose "call hierarchy", or use CTRL + ALT + H. This gives you all methods that call the selected method, with option to check further down the tree. This feature is really very useful.
Type hierarchy - see the inheritance hierarchy of classes. In eclipse it's F4 or CTRL + T.
Also:
find a way to make so that changes take effect on-save, and you don't have to redeploy
use a debugger - run in debug mode, within the IDE, so that you see how the flow proceeds

My friend, you are in deep doodoo. Modifying large, badly documented legacy code is one of those projects that makes experienced programmers seriously contemplate the joys of selling insurance, or some other alternative career. However it isn't impossible, and here are some tips that I hope will help.
Your first task is to understand the code as much as possible. You are at least on the right track there. Getting a good idea of the class structure is absolutely important, and a diagram is probably the best way. The other thing I would suggest is that when you find out what a class does, add the missing documentation yourself. That way when you come back to it you wont' have forgotten what you found out.
Don't forget the debugger. If you want to find out what is really going on, stepping through the relevant code, or simply finding out what a call stack really looks like at a certain point can be very helpful.

The only way to understand code is to read it. Keep working that is my advice.
There are projects with better documentation than others. Here is a couple of projects that I know are well organized:
Tomcat ,
Jetty,
Hudson,
You should check java-source for more open source projects.

Personally I think it is very difficult to try to understand an entire application all at once. Instead, try to focus only on certain modules. For example, if you can identify a module that you need to change (e.g. based on a screen, or certain input/output point), then start by making one small change and testing it. Go from there, making a small change, testing, and moving on.
Additionally, if your project has unit tests (consider yourself lucky) and review the unit tests of the module you are focusing on. That will help you get an idea of what the module is expected to do.

In my opinion there is no standard approach to understand a project. It depends on many factors, from the understandability of the code/architecture you're analyzing to your previous experience on large projects.
I suggest you to reverse-engineer the code by using a modeling tool, so that you can generate some UML models from the existing source code. These diagrams can be helpful as a graphic guideline during your anaysis of the code.
Don't be afraid to use debugging to grab the logic of the most complex functionalities of the project. Running the most complex code instruction by instruction, seeing the exact values of the variables and the interactions between the objects can be helpful.
Before you refactor to change the project to suit your needs, be sure to write some test cases, so that you can verify that your modifications don't break the code in unexpected ways.

Here are a couple recommendations
Get the code into some form of CVS.
This way if you start making changes
you can always look back at previous
versions.
Take the time to document what you
have already learned/gone through. Javadoc is fine
for this.
Create a UML structure for you code.
There are lots of plugins out there and wil give you a nice representation of your code layout.

Related

Separation of code to different classes [Java]

I have a bloated JDialog class (~2000 lines) that displays two unrelated JTables. I want to split it into three classes (JDialog, Jtable1 and JTable2). I can study which variables and which methods are used by each table and move them to relevant classes, but this manual refactoring is going to be tedious.
Is there any way to automate such refactoring?
To achieve this a script should have an accumulator of tokens. First token is, for instance jTable2 from panel.add(jTable2). Now check all lines that have jTable2 in them and add tokens to accumulator. Repeat search for relevant tokens until new tokens are not discovered. Now for each token find lines that contain it. Expand selection to include brackets.
It is hard to believe that programmers of the arguably largest language haven't created such a tool yet. This should be pretty similar to find usages tool in IDE.
Automatic? No, thank goodness. Refactoring requires thought. Deep learning isn't there yet.
Most IDEs (e.g. IntelliJ from JetBrains, the best IDE on the market) has excellent refactoring support.
But it won't think for you.
One piece of advice: You'll have better luck if you have unit tests, do it in small incremental bites, and use a version control system. Write a test, make a change, show that the test still passes, commit the change, repeat.
You can always go back to the last working version that way. You won't make a bigger mess than a single incremental step.
I think you can do even better: look at moving listeners and processing code out of the UI, too. Swing apps end up with big classes because people learn to cram everything into the UI classes. If you decompose it you'll find that the code is easier to read, more modular, and easier to unit test.
In NetBeans you can use Refactor->move. It starts a wizard that conveniently displays relevant methods. You need to select them that you want to move, but you don't have to hunt in code. Other IDEs have similar functionality.
This way you still have to think, but the boring part of finding them is done for you by IDE.
Take a look at this post (How to refactor thousands of lines of Java code? Is there any tool available?) that asks a similar question.
Basically, there are some production quality tools that help you extract classes once you know what it is what you want to put in the classes. Notably, IntelliJ's IDEA has a good "extract class" refactoring.
The harder part is determining what should go into those classes. AFAIK, there are only research tools available for that.

Finding how a large Java code works

I have done some Java programs on my own but now I found an interesting Java project to work. I chose one item from todo list and now I would like to implement it and find a suitable place in the original code for it. What are some good strategies to find the correct place? I'm using Eclipse Helios and its debugger.
This is where coding convention and technical documentation would help you. If the java programs you are talking about is written correctly with the correct conventions and everything, you should be able to figure out where your code should reside.
Best way would be to run through the part where the TODO is needed. If the todo is specific to current class, it would be ideal to just put it in the same file. Of course, TODO usually (but not all the time) means that it might be an enhancement to the current code. If that's the case, then creating a new method for it would be good.
if on the other hand, you think your code would be useful for the entire project, a utility method would be the perfect place to store your code.
If it's something you can make a local copy of, try getting it all to work IE do a couple of test runs, and then try deleting some files that look unimportant. It may sound silly but it'll show you straight away whether or not something is a core part of the code.
Once you get down to class level, read the whole thing. Eventually you'll get to know a few core classes really well and gain a basic understanding of what all the others do.
If you are completely new to a project, but that project has other developers I suggest you ask someone more familiar with the code base. If you are on your own you would have to see if there are any functions that are similar to what you want to do. You could then try and put your own code in the same place (package/class/whatever is appropriate) and job done!
Good luck!
Start working on a private branch (you do use some version control software, right?) and make sure you understand how to validate the project (you do use automated tests, right?). After that, just start experimenting!
If this is an open source project then it most likely have some means of contacting the developers for the project. A mailing list is usually available, where you can ask these questions to those who know the code well.
Remember to choose an active project...

How should I visualize the structure of my code? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 8 years ago.
Improve this question
I have an application written in Java. In is stored in several files. It uses different classes with different methods. The code is big and complicated. I think it would be easier to understand the code if I have a graphical model of the code (some kind of directed graph). Are there some standard methods for visualization of code. I am thinking about usage of UML (not sure it is a correct choice). Can anybody recommend me something?
ADDED:
I consider two possibilities:
Creating the graph by hands (explicitly).
Creating graph in an automatic way. For example to use some tools that read the available code and generate some graph describing the structure of the code.
ADDED 2:
It would be nice to have something for free.
I tried using a number of UML tools and found that the reverse-engineering capabilities in most UML tools were not helpful for understanding code. They focus on designing needs and reverse-engineering capabilities often just ends up showing huge pictures of lots of useless information. When I was working on the Microsoft Office codebase, I found using a pen-and-paper more helpful that the typical design/modelling tools.
You typically want to think about doing this in a number of ways:
Use your brain: Someone else mentioned it - there is no substitute to actually trying to understand a code base. You might need to take notes down and refer back to it later. Can tools help? Definitely. But don't expect them to do most of the work for you.
Find documentation and talk to co-workers: There is no better way than having some source describe the main concepts in a codebase. If you can find someone to help you, take a pen and paper, go to him and take lots of notes. How much to bug the other person? In the beginning - as much as is practical for your work, but no amount is too little.
Think about tools: If you are new to a part of a project - you are going to be spending a significant amount of time understanding the code, so see how much help you can get automatically. There are good tools and bad tools. Try to figure out which tools have capabilities that might be helpful for you first. As I mentioned above, the average UML tool focuses more on modeling and does not seem to not be the right fit for you.
Time vs Cost: Sure, free is great. But if a free tool is not being used by many people - it might be that the tool does not work. There are many tools that were create just as an exploration of what could be done, but are not really helpful and therefore just made available for free in hopes that someone else will adopt it. Another way to think about it, decide how much your time is worth - it might make sense to spend a day or two to get a tool to work for you.
Once there, keep these in mind when going trying to understand the project:
The Mile High View: A layered architectural diagram can be really helpful to know how the main concepts in a project are related to one another. Tools like Lattix and Architexa can be really helpful here.
The Core: Try to figure out how the code works with regards to the main concepts. Class diagrams are exceptionally useful here. Pen-and-paper works often enough here, but tools can not only speed up the process but also help you save and share such diagrams. I think AgileJ and Architexa are your best bets here, but your average UML tool can often be good enough.
Key Use Cases: I would suggest tracing atleast one key use case for your app. You likely can get the most important use cases from anyone on your team, and stepping through it will be really helpful. Most IDE's are really helpful here. If you try drawing them, then sequence diagrams arethe most appropriate. For tools here I think MaintainJ, JDeveloper and Architexa are your best bets here.
Note: I am the founder of Architexa - we build tools to help you understand and document Java code, but I have tried to be unbiased above. My intention is to suggest tools and options given that this is what I focused on as part of my PhD.
The most important tool you should use is your brain, and it's free.
There's no reason why you have to use any sort of standard method of visualization, and you can use whatever media you like. Paper, whiteboard, photoshop, visio, powerpoint, notepad: all of these can be effective. Draw a diagram of classes, objects, methods, properties, variables - whatever you think is useful to see in order to understand the application. The audience is not only other members of your team, but also yourself. Create diagrams that are useful for you to look at and quickly understand. Post them around your workspace and look at them regularly to remind yourself of the overall system architecture as you build it.
UML and other code documentation standards are good guidelines for the types of diagrams you can do and the information you should consider including. However, it is overkill for most applications and basically exists for people who can't take personal responsibility for documenting without standards. If you follow UML to the letter, you'll end up spending way too much time on your documentation instead of creating your application.
It is stored in several files. It uses different classes with different methods. The code is big and complicated.
All Java code written outside the school is like that, particularly for a new developer starting on a project.
This is an old question, but as this is coming up in Google searches, I am adding my response here so that it could be useful to the future visitors. Let me also disclose that I am the author of MaintainJ.
Don't try to understand the whole application
Let me ask you this - why do you want to understand the code? Most probably you are fixing a bug or enhancing a feature of the application. The first thing you should not try to do is to understand the whole application. Trying to understand the entire architecture while starting afresh on a project will just overwhelm you.
Believe me when I say this - developers with 10+ years of solid coding experience may not understand how certain parts of the application work even after working on the same project for more than a year (assuming they are not the original developers). They may not understand how the authentication works or how the transaction management works in the application. I am talking about typical enterprise applications with 1000 to 2000 classes and using different frameworks.
Two important skills required to maintain large applications
Then how do they survive and are paid big bucks? Experienced developers usually understand what they are doing; meaning, if they are to fix a bug, they will find the location of the bug, then fix it and make sure that it does not break the rest of the app. If they need to enhance a feature or add a new feature, most of the time, they just have to imitate an existing feature that does a similar thing.
There are two important skills that help them to do this.
They are able to analyze the impact of the change(s) they do while fixing a bug. First they locate the problem, change the code and test it to make sure that it works. Then, because they know the Java language well and the frameworks 'well enough', they can tell if it will break any other parts of the app. If not, they are done.
I said that they simply need to imitate to enhance the application. To imitate effectively, one needs to know Java well and understand the frameworks 'well enough'. For example, when they are adding a new Struts Action class and adding to the configuration xml, they will first find a similar feature, try to follow the flow of that feature and understand how it works. They may have to tweak a bit of the configuration (like the 'form' data being in 'request' than in 'session' scope). But if they know the frameworks 'well enough', they can easily do this.
The bottom line is, you don't need to understand what all the 2000 classes are doing to fix a bug or enhance the app. Just understand what's needed.
Focus on delivering immediate value
So am I discouraging you from understanding the architecture? No, not at all. All I am asking you is to deliver. Once you start on a project and once you have set up the development environment on your PC, you should not take more than a week to deliver something, however small it may be. If you are an experienced programmer and don't deliver anything after 2 weeks, how can a manager know if you really working or reading sports news?
So, to make life easier for everyone, deliver something. Don't go with the attitude that you need to understand the whole application to deliver something valuable. It's completely false. Adding a small and localized Javascript validation may be very valuable to the business and when you deliver it, the manager feels relieved that he has got some value for his money. Moreover, it gives you the time to read the sports news.
As time passes by and after you deliver 5 small fixes, you would start to slowly understand the architecture. Do not underestimate the time needed to understand each aspect of the app. Give 3-4 days to understand the authentication. May be 2-3 days to understand the transaction management. It really depends on the application and your prior experience on similar applications, but I am just giving the ballpark estimates. Steal the time in between fixing the defects. Do not ask for that time.
When you understand something, write notes or draw the class/sequence/data model diagram.
Diagrams
Haaa...it took me so long to mention diagrams :). I started with the disclosure that I am the author of MaintainJ, the tool that generates runtime sequence diagrams. Let me tell you how it can help you.
The big part of maintenance is to locate the source of a problem or to understand how a feature works.
MaintainJ generated sequence diagrams show the call flow and data flow for a single use case. So, in a simple sequence diagram, you can see which methods are called for a use case. So, if you are fixing a bug, the bug is most probably in one of those methods. Just fix it, ensure that it does not break anything else and get out.
If you need to enhance a feature, understand the call flow of that feature using the sequence diagram and then enhance it. The enhancement may be like adding an extra field or adding a new validation, etc. Usually, adding new code is less risky.
If you need to add a new feature, find some other feature similar to what you need to develop, understand the call flow of that feature using MaintainJ and then imitate it.
Sounds simple? It is actually simple, but there will be cases where you will be doing larger enhancements like building an entirely new feature or something that affects the fundamental design of the application. By the time you are attempting something like that, you should be familiar with the application and understand the architecture of the app reasonably well.
Two caveats to my argument above
I mentioned that adding code is less risky than changing existing code. Because you want to avoid changing, you may be tempted to simply copy an existing method and add to it rather than changing the existing code. Resist this temptation. All applications have certain structure or 'uniformity'. Do not ruin it by bad practices like code duplication. You should know when you are deviating from the 'uniformity'. Ask a senior developer on the project to review the changes. If you must do something that does not follow the conventions, at least make sure that it's local to a small class (a private method in a 200 line class would not ruin the application's esthetics).
If you follow the approach outlined above, though you can survive for years in the industry, you run the risk of not understanding the application architectures, which is not good in the long run. This can be avoided by working on bigger changes or by just less Facebook time. Spend time to understand the architecture when you are a little free and document it for other developers.
Conclusion
Focus on immediate value and use the tools that deliver that, but don't be lazy. Tools and diagrams help, but you can do without them too. You can follow my advice by just taking some time of a senior developer on the project.
Some plugins I know for Eclipse:
Architexa
http://www.architexa.com/
nWire
http://www.nwiresoftware.com/
If you want to reverse engineer the code, you should try Enterprise Architect
have you tried Google CodePro Analytix ?
it can for example display dependencies and is free (screenshot from cod.google.com):
Here is a non UML Tool which has very nice visualization features.
You can mapping the lines of code per class / method to colors / side lenght of rectangles.
You can also show the dependencies between the classes.
http://www.moosetechnology.org/
The nice thing is, you can use Smalltalk scripting for displaying what you need:
http://www.moosetechnology.org/docs/faq/JavaModelManipulation
Here you can see how such a visualization looks like:
http://www.moosetechnology.org/tools/moosejee/casestudy
JUDE Community UML used to be able to import Java, but it is no longer the case. It is a good, free tool.
If your app is really complex I think that diagrams won't carry you very far. When diagrams become very complex they become hard to read and lose their power. Some well chosen diagrams, even if generated by hand, might be enough.
You don't need every method, parameter, and return value spelled out. Usually it's just the relationships and interactions between objects or packages that you need.
Here is a another tool that could do the trick:
http://xplrarc.massey.ac.nz/
You can use JArchitect tool, a pretty complete tool to visualize your code structure using the dependency graph, and browse you source code like a database using CQlinq.
JArchitect is free for open source contributors
Some great tools I use -
StarUML (allows code to diagram conversion)
MS Visio
XMind (very very useful for overview of the system)
Pen and Paper!

Important things to keep it mind before a Code Review in Java

I have just created a mid-sized web-application using Java, a custom MVC framework, javascript. My code will be reviewed before it's put in the productions servers (internal use).
The primary objective of building this app was to solve a small problem for internal use and understand the custom made MVC framework used by my employer. So, my app has gone through MANY iterations, feature changes and additions.
So, bottom line, the code is very very dirty and this is my first "product level" Java app.
What are your suggestions, what are some basic checks/refractoring I should do before the code review?
I am thinking about:
Java best practices (conventions)
Make the code simple to understand for the developer who will maintain it. (won't be me)
I noticed, I have created some unnecessary objects and used hashmaps/arraylists where could have easily used some other Data structure and achieved the solution. So, is that worth changing?
Update
Your Code Sucks and I Hate You: The Social Dynamics of Code Reviews
If you did not already, (assuming you use an IDE like eclipse)
get plugins checkstyle and findbugs
go through their configuration and tune to your style
run them on your code
resolve all issues reported
you can also tune the compiler warning setting of eclipse itself and possibly make them more strict in what is reported.
Look at code structure:
get plugin jdepend
investigate your package structure
Code against interfaces (Map, List, Set) instead of implementation classes (HashMap, ArrayList, TreeSet)
Complete your Javadoc and make check it is up to date after all refactorings.
Add JUnit tests; if you have no time left to test the whole application, at least create a test for every bug you find and solve from now on. This helps "growing" a test set as you go.
Next time design and build your application with the end goal in sight. Always assume that the next guy having to maintain your code will know how to find you :-)
Unit tests, and they should be automated as part of your build. You should already have these, but if not, do it now. It will definitely make the refactoring easier, as well improving your general confidence in the code (and the guy who will be maintaining it).
Logging.
One of the more overlooked things is the importance of logging. You need to have a decent logging methodology put in place. Even though this is an internal app, make sure that the basic logs can help regular users find issues and provide more detailed logging so that you (the developer) would know where to go.
Comment your code, explain why it's doing what it's doing and what assumptions have been made.
Try to reduce the amount of mutating state.
Try to remove any singletons you may have.

How to understand Open Source projects/libraries?

There are few open source projects/APIs/libraries that we use in our project (Spring, Struts, iBatis etc.) and I want to understand their design and how they work internally.
What is the best way to understand these projects? Note that I am already using these libraries in my project. And I know the input-output interaction/configurations for these libraries. What I don't understand is how these APIs/libraries work internally.
The problems I face is:
Finding the entry class of the library. Is there any way I can know the entry class for the library - something which is kicking the whole API?
Tools/Plugins to use in Eclipse to get an overview of the design of the library. Going through each and every class of the library, can be a very daunting task. Is there any tool you would like to recommend which can generate the class diagrams of the API in Eclipse.
Thanks in advance!!
UPDATE: I need some inputs on eclipse plugins which can help me in getting an overview/class diagram of the library
I always use the same strategy for this: I never try to "understand" the code base as a whole, and I usually try to follow the request flow. I read enough of the documentation to determine what is necessary to use the application, and I read that code (Keep all source code loaded in your IDE).
For example, in struts you'll be installing a servlet filter in web.xml. Start reading the filter and follow the path a single request takes through your stack.
Likewise for spring, there are two main entry points, the filter and "getBean", both of which are mentioned real early in the documentation. Read those two.
For both of these cases you'll find one or two classes that represent the "core" of the framework real quickly. Read those really well and let actual use cases & needs drive your further exploration.
Approaching "understanding" of an open source library (or any other code base for that matter) by trying to find all the pieces is usually not a very good way of approaching these things, it will usually just lead nowhere because a lot of these things contain too much code. When following the request flow I find making diagrams can also be quite distracting, it tends to draw attention/focus away from understanding (and since my understanding increases rapidly most of them are out-of-date even before they reach the printer).
Nice question!!!, what I've done, specially in the case of Spring, apart from consulting the Documentation and their API's is to attach the sources of the project to my project on Eclipse, that way I'm able to navigate through the source code, not just the API. Its been quite helpful specially in the case of the Spring-Security project, there were some concepts that I just couldn't understand until I inspected the source code.
That's one of the advantages of using Open Source libraries.
Regards.
Tools like Structure101 (http://www.headwaysoftware.com/products/structure101/index.php), and Lattix (http://www.lattix.com/) let you analyze code and produce architecture diagrams / dependency matrices.
This is not exactly class diagram - the main focus is on layering. So the entry point is usually the topmost layer.
But then again, as I specified above, you will notice that some libs are just a mess, and these tools will not be helpful enough.
See the S101 online demo: http://www.structure101.com/java/
This for example is the Sonar project architecture: http://www.structure101.com/java/tracker/sonar/1.11.1/arch.html
Your best bet for those three would be to consult the official documentation (make sure you are looking at the version you are using) or to get a book on the technology.
Most APIs don't have a class with a main method; they're running in the webserver called by the server itself. Unless they're running as their own server, they won't have a main method.

Categories

Resources