How Do You Keep UML Diagrams Up To Date?

How Do You Keep UML Diagrams Up To Date? - java

I am from a Physics background and not a Computer Science background and never did any course at University on class/component diagrams etc and I have never found the need to use them at work.
The main thing that I don't understand is how do you keep them up to date if the code is still being developed or maintained?
e.g. What's to stop me from refactoring several methods or classes and making the class diagram obsolete?
Do you have to constantly update the diagram manually?
I have seen tools that generate UML from the code and these could keep it up to date I suppose but from what I have seen, the auto-generated diagrams don't seem to be useful enough.
Is the UML for a project likely to be created at the start then be left in a documentation folder and gradually get more and more out of date?

I work for a moderately large government agency, so most of our major projects fall into the "Enterprise Java" category. This is what works for us:
Architects model any changes and extensions to our corporate data model using UML diagrams. Generally there will be a conceptual model class diagram, plus a few sequence diagrams that illustrate how the various parts of the system will interact, and maybe a couple of component diagrams.
We have a walkthrough with the business analysts, DBAs and lead developers. This idea of this is to challenge the new model, and agree on changes (there is a lot of "robust" discussion at these sessions). With a good architect, the changes are minimal.
A senior developer creates a technical specification that will typically include a physical database ER diagram based on the architectural model. From the physical model, we automatically generate a database creation script.
The DBAs upgrade the creation script (e.g. Add tablespace and indexspace info) and create/extend the database.
The code gets written. Developers may create their own mini class hierarchies (e.g. POJOs to carry around data). We don't bother to model these in UML as the code should be self-documenting, and changes are inevitable as the code evolves.
Quite often changes will occur during the development phase, especially if using agile methodologies. If these impact on the corporate data model, then the UML and ER diagrams will be updated.
At the end of the project, the documentation is updated to reflect "as built" state.
Getting back to the gist of your question, I'm not a great believer in automated UML <-> code generation. Generally there is data that is personal to the UML diagram (notes, relationship cardinality, sequence diagrams etc) that does not appear in the code or is very difficult to extract. Conversely the code contains stuff (e.g. behavioural method working logic, data structures and caches) that do not necessarily show in the logical UML model. Then there is the whole question of how you map the logical model class hierarchy to database tables...
To summarise, I recommend:
Get the design correct up front. Changes to the logical model are expensive and awkward to implement.
Use a modelling tool that will support all of the artifacts you need from the same data source. That is, the initial UML logical model, the database ER diagram and the database creation SQL DDL. We use Enterprise Architect, but there are lots of other tools that will do this.
Use UML to model the "big picture" and forget it for describing detailed coding. A good rule of thumb is you need UML if any change to the model affects more than just your team. (e.g. A new database field may require a change to the database, a change to a web service, a change to the GUI and a change to a mainframe batch process. UML has a place in defining the data change in a way multiple teams can understand)

Although this is not a question that is normally answered on SO, here's what I've seen and heard:
A project's SW development plan must define how design is being done, and if UML is used, how an update to the SW must be made. That plan can define that the UML is "one shot" - so it is indeed forgotten after the first design progresses into code. OTOH, a strict follow-up rule and ensuing checking may require and guarantee that the UML design is updated during bug fixes (if required) or more extensive changes. (More often than not, you may even have to go back to requirements and update there, too.)
A completely different approach is to generate code from UML - that way you never change the code. Whether this works or not, given the potential differences between UML's expressiveness and what and how a language like Java or C++ provide to implement the semantics of the various diagrams, is a question I'd dearly love to have answered on more reliable data than a salesman's pitch.

As for my experience, class UML diagrams are mostly useless. Generic code changes too often, thus having UML diagrams for it adds too much burden.
Possible exceptions are:
Architecture (component-level) diagrams. Created once, changes rarely, useful for others
Business model. If your application operates on complex model, it may be worth it to generate classes from UML representation. This UML can become quite valuable if you have many applications operating on the same model.
University projects - no comments :-)

It depends on what you choose, what you agree on with the team and stakeholders, what are your priorities, what are your processes and their deliverable artifacts and what are the costs and who will pay them.
As of today there are no production-ready tools or machines to keep the UML documentation up-to-date completely automatically although many are close, e.g. Graphviz + Doxygen to generate UML class diagrams
and many make this task easier, e.g. Sparx Systems's Enterprise Architect or Rapid Quality Systems's Code Rocket
As any other process the UML documentation creation/maintenance is a process that needs to be defined, implemented, managed, optimized (same way as you need to manage experiments in Physics which you already know)
There is a whole website devoted to this topic at Agile Modeling - Effective Practices for Modeling and Documentation

Related

How to generate System Architecture from java code?

By System Architecture I mean the computational components of the software system and interactions/relationship among those components. The components may be tasks, processes, objects or modules etc. Different components are connected by connectors(procedure call, implicit invocation, message passing, instantiation, shared database etc).
I have generated UML diagrams via reverse engineering using Visual Paradigm, but can I also generate Architecture?

Since components and interactions can be not explicit in code in general case you can not generate such diagram automatically. You should study different aspects of your application: source code, existing documentation, user interface, configuration, jira tasks, etc and try to restore the original architecture.

An "architecture" is just a view of some code properties, that don't change very fast. (If a property changes fast, it isn't the basis of an architecture).
From this perspective, UML diagrams are a kind of architecture. So is a call graph, and so is any modularization scheme you might have chosen.
The lesson is that "extracting architecture" first requires you decide what properties of the code you want to see abstractly, and then building (if you can) machinery to extract that information.
Since people mostly don't agree on what properties are useful, you gets lots of arguments about "what's an (my favorite) architecture", and you don't get a lot of tools since they are hard to build and there are lots of them.

Designing APIs in Java with top-down approach - Is writing up the Javadoc the best starting point?

Whenever I have the need to design an API in Java, I normally start off by opening up my IDE, and creating the packages, classes and interfaces. The method implementations are all dummy, but the javadocs are detailed.
Is this the best way to go about things? I am beginning to feel that the API documentation should be the first to be churned out - even before the first .java file is written up. This has few advantages:
The API designer can complete the design & specification and then split up the implementation among several implementors.
More flexible - change in design does not require one to bounce around among java files looking for the place to edit the javadoc comment.
Are there others who share this opinion? And if so, how do you go about starting off with the API design?
Further, are there any tools out there which might help? Probably even some sort of annotation-based tool which generates documentation and then the skeleton source (kind of like model-to-code generators)? I came across Eclipse PDE API tooling - but this is specific to Eclipse plugin projects. I did not find anything more generic.

For an API (and for many types of problems IMO), a top-down approach for problem partitioning and analysis is the way to go.
However (and this is just my 2c based on my own personal experience, so take it with a grain of salt), focusing on the Javadoc part of it is a good thing to do, but that is still not sufficient, and cannot reliably be the starting point. In fact, that is very implementation oriented. So what happened to the design, the modeling and reasoning that should take place before that (however brief that might be)?
You have to do some sort of modeling to identify the entities (the nouns, roles and verbs) that make up your API. And no matter how "agile" one would like to be, such things cannot be prototyped without having a clear picture of the problem statement (even if it is just a 10K foot view of it.)
The best starting point is to specify what you are trying to implement, or more precisely, what type of problems your API is trying to address. BDD might be of help (more of that below). That is, what is it that your API will provide (datum elements), and to whom, performing what actions (the verbs) and under what conditions (the context). That leads then to identify what entities provide these things and under what roles (interfaces, specifically interfaces with a single, clear role or function, not as catch-all bags of methods). That leads to an analysis on how they are orchestrated together (inheritance, composition, delegation, etc.)
Once you have that, then you might be in a good position to start doing some preliminary Javadoc. Then you can start working on the implementation of those interfaces, of those roles. More Javadoc follows (in addition to other documentation that might not fall within Javadoc .ie. tutorials, how-tos, etc.)
You start your implementation with use cases and verifiable requirements and behavioral descriptions of what each thing should do alone or in collaboration. BDD would be extremely helpful here.
As you work on, you continuously refactor, hopefully by taking some metrics (cyclomatic complexity and some variant of LCOM). These two tell you where you should refactor.
A development of an API should not be inherently different from the development of an application. After all, an API is a utilitarian application for a user (who happens to have a development role.)
As a result, you should not treat API engineering any diferently from general software-intensive application engineering. Use the same practices, tune them according to your needs (which every one who works with software should), and you'll do fine.
Google has been uploading its "Google Tech Talk" video lecture series on youtube for quite some time. One of them is an hour long lecture titled "How To Design A Good API and Why it Matters". You might want to check it out also.
Some links for you that might help:
Google Tech Talk's "Beyond Test Driven Development: Behaviour Driven Development" : http://www.youtube.com/watch?v=XOkHh8zF33o
Behavior Driven Development : http://behaviour-driven.org/
Website Companion to the book "Practical API Design" : http://wiki.apidesign.org/wiki/Main_Page
Going back to the Basics - Structured Design#Cohesion and Coupling : http://en.wikipedia.org/wiki/Structured_Design#Structured_Design

Defining the interface first is the programming-by-contract style of declaring preconditions, postconditions and invariants. I find it combines well with Test-Driven-Development (TDD), because the invariants and postconditions you write first are the behaviours that your tests can check for.
As an aside, it seems the Behaviour-Driven-Development elaboration of TDD seems to have come about because of programmers who did not habitually think of the interface first.

As for my self, I always prefer starting with writing the interfaces along with their documentation and only then start with the implementation.
In the past I took another approach which was starting with the UML and then using the automatic code generation.
The best tool I encountered for this matter was Rational Rose which is not free but I'm sure there are plenty of free plugins and utils.
The advantage of Rational Rose over other designers I bumped into was that you can "attach" the design to your code and then modify on either code or design and the other will update.

I jump right in with the coding with a prototype. Any required interfaces soon pop out at you and you can mould your proto into a final product. Get feedback along the way from whomever is going to be using your API if you can.
There is no 'best way' of approaching API design, do whatever works for you. Domain knowledge also has a large part to play

I'm a great fan of programming to the interface. It forms a contract between the implementors and the users of your code.
Rather than diving straight into code, I usually start with a basic model of my system (UML diagrams etc, depending on the complexity). Not only does this serve as good documentation, it provides a visual clarification of the system structure. Having this makes the coding part much easier to do. This kind of design documentation also makes it easier to understand the system when you come back to it in 6 months, or try to fix bugs :)
Prototyping also has its merits, but be prepared to throw it away and start again.

Java to Clojure rewrite

I have just been asked by my company to rewrite a largish (50,000 single lines of code) Java application (a web app using JSP and servlets) in Clojure. Has anyone else got tips as to what I should watch out for?
Please bear in mind that I know both Java AND Clojure quite well.
Update
I did the rewrite and it went into production. It's quite strange as the rewrite ended up going so fast that it was done in about 6 weeks. Because a lot of functionality wasn't needed still it ended up more like 3000 lines of Clojure.
I hear they are happy with the system and its doing exactly what they wanted. The only downside is that the guy maintaining the system had to learn Clojure from scratch, and he was dragged into it kicking and screaming. I did get a call from him the other day saying he loved Lisp now though.. funny :)
Also, I should give a good mention to Vaadin. Using Vaadin probably accounted for as much of the time saved and shortness of the code as Clojure did.. Vaadin is still the top web framework I have ever used, although now I'm learning ClojureScript in anger! (Note that both Vaadin and ClojureScript use Google's GUI frameworks underneath the hood.)

The biggest "translational issue" will probably be going from a Java / OOP methodology to a Clojure / functional programming paradigm.
In particular, instead of having mutable state within objects, the "Clojure way" is to clearly separate out mutable state and develop pure (side-effect free) functions. You probably know all this already :-)
Anyway, this philosophy tends to lead towards something of a "bottom up" development style where you focus the initial efforts on building the right set of tools to solve your problem, then finally plug them together at the end. This might look something like this
Identify key data structures and transform them to immutable Clojure map or record definitions. Don't be afraid to nest lots of immutable maps - they are very efficient thanks to Clojure's persistent data structures. Worth watching this video to learn more.
Develop small libraries of pure, business logic oriented functions that operate on these immutable structures (e.g. "add an item to shopping cart"). You don't need to do all of these at once since it is easy to add more later, but it helps to do a few early on to facilitate testing and prove that your data structures are working..... either way at this point you can actually start writing useful stuff interactively at the REPL
Separately develop data access routines that can persist these structures to/from the database or network or legacy Java code as needed. The reason to keep this very separate is that you don't want persistence logic tied up with your "business logic" functions. You might want to look at ClojureQL for this, though it's also pretty easy to wrap any Java persistence code that you like.
Write unit tests (e.g. with clojure.test) that cover all the above. This is especially important in a dynamic language like Clojure since a) you don't have as much of a safety net from static type checking and b) it helps to be sure that your lower level constructs are working well before you build too much on top of them
Decide how you want to use Clojure's reference types (vars, refs, agents and atoms) to manage each part mutable application-level state. They all work in a similar way but have different transactional/concurrency semantics depending on what you are trying to do. Refs are probably going to be your default choice - they allow you to implement "normal" STM transactional behaviour by wrapping any code in a (dosync ...) block.
Select the right overall web framework - Clojure has quite a few already but I'd strongly recommend Ring - see this excellent video "One Ring To Bind Them" plus either Fleet or Enlive or Hiccup depending on your templating philosophy. Then use this to write your presentation layer (with functions like "translate this shopping cart into an appropriate HTML fragment")
Finally, write your application using the above tools. If you've done the above steps properly, then this will actually be the easy bit because you will be able to build the entire application by appropriate composition of the various components with very little boilerplate.
This is roughly the sequence that I would attack the problem since it broadly represents the order of dependencies in your code, and hence is suitable for a "bottom up" development effort. Though of course in good agile / iterative style you'd probably find yourself pushing forward early to a demonstrable end product and then jumping back to earlier steps quite frequently to extend functionality or refactor as needed.
p.s. If you do follow the above approach, I'd be fascinated to hear how many lines of Clojure it takes to match the functionality of 50,000 lines of Java
Update: Since this post was originally written a couple of extra tools/libraries have emerged that are in the "must check out" category:
Noir - web framework that builds on top of Ring.
Korma - a very nice DSL for accessing SQL databases.

What aspects of Java does your current project include? Logging, Database transactions, Declarative transactions/EJB, web layer (you mentioned JSP, servlets) etc. I have noticed the Clojure eco-system has various micro-frameworks and libraries with a goal to do one task, and do it well. I'd suggest evaluate libraries based on your need (and whether it would scale in large projects) and make an informed decision. (Disclaimer: I am the author of bitumenframework) Another thing to note is the build process - if you need a complex setup (dev, testing, staging, prod) you may have to split the project into modules and have the build process scripted for ease.

I found the most difficult part was thinking about the database. Do some tests to find the right tools you want to use there.

Code understanding, reverse engineering, best concepts and tools. Java

One of most demanding tasks for any programmer, architect is understanding other's code.
For example, I am contractor, hired to rescue some project very quickly. Fix bugs, plan global refactoring and therefore I need most efficient way to understand the code. What is the list of concepts, their priority and best tools for this?
Of what I know: reverse code engineering to create object models (creating of diagram per package is not so convenient), create sequence diagrams (the tool connects in debug mode to the system and generates diagrams from runtime). Some visualizing techniques, using some tools to work not just with .java but also with e.g. JPA implementors like Hibernate. Generate diagram for not all the codebase, but add some class and then classes used by it.
Is Sparx Enterprise Architect state of the art in reverse engineering or far from that? Any other better tools? Ideally would be that tool makes me understand the code as if I wrote it myself :)

The book Object-Oriented Reengineering Patterns deals with this in detail. Unfortunately there is no silver bullet attached :-)
However, it lists a lot of useful techniques for taking over legacy code. In brief
interview at least some of the original developers (if they are still around) about
development history: phases, releases
current state of affairs
team social structure, politics, dynamics: when and why did people join and leave
bugs: typical, easiest, hardest
code quality: cleanest / ugliest parts
configuration data: form, content and usage
unit / integration / manual / ... test cases and data
SCM branch structure and usage
documentation: what is documented where, is it up to date
contact persons for external interfaces
Watch developers / users during demo to find
main features
typical use cases
usage anecdotes
good / bad, missing / superfluous functionality
"read all the code in one hour"
get high level view of class hierarchies, interfaces
take multiple sessions if needed
identify large structures (these often contain important functionality)
look for design patterns
check comments (they can reveal a lot, but may be also misleading)
skim documentation (if there is any)
just record the availability of specific types of docs e.g. specification, UML diagram, Wiki, Javadoc etc.
is it useful and why (not)
is it up to date

By far the most important tools are your ears, your tongue and your larynx. Ask the people who are familiar with the code - they'll be able to help you understand its general architecture much better than any software tools.
Automatically reverse-engineered complete UML models are generally nearly useless because they cannot distinguish between important abstractions and implementation details - which is the whole point of such models.
Software tools are more useful to answer very specific questions when you are investigating details, such as "where is this method called from?" or "what classes implement this interface" - any good IDE will be able to do that. Debuggers can help too - placing breakpoints at keypoints of the code and looking at the call stack when they're hit is often very enlightening.

Just to elaborate on Michaels mentioning of good IDE's which can help you:
I use the following Eclipse facilities a lot:
Shift-F2 when the cursor is placed in an identifier brings up the Javadoc for that identifier, if any. Good for navigating.
Hovering the mouse over an identifier brings up a box with the Javadoc in it, if any. Good for reminding when writing e.g. a method call.
The Declaration view shows the source where the keyword where the cursor is placed, is defined. This is updated when the cursor moved.
F3 goes to the definition of the current identifier.
Ctrl-T on an identifier shows all subclasses and implementations in a popup. Very useful when working on interfaces.
F4 on an identifier brings up the implementation hierarchy of that identifier in a panel, which can be navigated. Very useful to learn how things are connected. This includes both classes and interfaces.

EclipseUML Omondo is the best Java reverse engineering tool. It reverse all the java code, all packages and even class interaction with interface if not in the same package. Just amazing.
You can also reverse:
- .class
- hibernate annotations
- JPA annotations
What I like with this tool is that my code is clean because all the model information is saved into an xmi format and not as tag in my code. You can also create small documentation inside each existing package using diagrams as a view of the model. Just marvelous and respecting the official uml 2.2 specification.
The only problem is that it is really too expensive so the price is a stop for me !!

Doesn't extract high level architectures, but does make it much easier to climb around your Java code: our Java Source Code Browser. This reads source code (and supporting class files) and produces Javadoc style documentation plus source text bi-directionally hyperlinked to the Javadoc information.
(I'm one of the principals behind it).

I use Enterprise Architect for whole UML (including reverse engineering with Java) and it works perfectly.

Python Programming - Rules/Advice for developing enterprise-level software in Python?

I'm a somewhat advanced C++/Java Developer who recently became interested in Python and I enjoy its dynamic typing and efficient coding style very much. I currently use it on my small programming needs like solving programming riddles and scripting, but I'm curious if anyone out there has successfully used Python in an enterprise-quality project? (Preferably using modern programming concepts such as OOP and some type of Design Pattern)
If so, would you please explain why you chose Python (specifically) and give us some of the lessons you learned from this project? (Feel free to compare the use of Python in the project vs Java or etc)

I'm using Python for developing a complex insurance underwriting application.
Our application software essentially repackages our actuarial model in a form that companies can subscribe to it. This business is based on our actuaries and their deep thinking. We're not packaging a clever algorithm that's relatively fixed. We're renting our actuarial brains to customers via a web service.
The actuaries must be free to make changes as they gain deeper insight into the various factors that lead to claims.
Static languages (Java, C++, C#) lead to early lock-in to a data model.
Python allows us to have a very flexible data model. They're free to add, change or delete factors or information sources without a lot of development cost and complexity. Duck typing allows us to introduce new pieces without a lot rework.
Our software is a service (not a package) so we have an endless integration problem.
Static languages need complex mapping components. Often some kind of configurable, XML-driven mapping from customer messages to our ever-changing internal structures.
Python allows us to have the mappings as a simple Python class definition that we simply tweak, test and put into production. There are no limitations on this module -- it's first-class Python code.
We have to do extensive, long-running proof-of-concept. These involve numerous "what-if" scenarios with different data feeds and customized features.
Static languages require a lot of careful planning and thinking to create yet another demo, yet another mapping from yet another customer-supplied file to the current version of our actuarial models.
Python requires much less planning. Duck typing (and Django) let us knock out a demo without very much pain. The data mappings are simple python class definitions; our actuarial models are in a fairly constant state of flux.
Our business model is subject to a certain amount of negotiation. We have rather complex contracts with information providers; these don't change as often as the actuarial model, but changes here require customization.
Static languages bind in assumptions about the contracts, and require fairly complex designs (or workarounds) to handle the brain-farts of the business folks negotiating the deals.
In Python, we use an extensive test suite and do a lot of refactoring as the various contract terms and conditions trickle down to us.
Every week we get a question like "Can we handle a provision like X?" Our standard answer is "Absolutely." Followed by an hour of refactoring to be sure we could handle it if the deal was struck in that form.
We're mostly a RESTful web service. Django does a lot of this out of the box. We had to write some extensions because our security model is a bit more strict than the one provided by Django.
Static languages don't have to ship source. Don't like the security model? Pay the vendor $$$.
Dynamic languages must ship as source. In our case, we spend time reading the source of Django carefully to make sure that our security model fits cleanly with the rest of Django. We don't need HIPAA compliance, but we're building it in anyway.
We use web services from information providers. urllib2 does this for us nicely. We can prototype an interface rapidly.
With a static language, you have API's, you write, you run, and you hope it worked. The development cycle is Edit, Compile, Build, Run, Crash, Look at Logs; and this is just to spike the interface and be sure we have the protocol, credentials and configuration right.
We exercise the interface in interactive Python. Since we're executing it interactively, we can examine the responses immediately. The development cycle is reduced to Run, Edit. We can spike a web services API in an afternoon.

I've been using Python as distributed computing framework in one of the worlds largest banks.
It was chosen because:
It had to be extremely fast for developing and deploying new functionalities;
It had to be easily integrable with C and C++;
Some parts of the code were to be written by people whose area of expertise was mathematical modeling, not software development.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.