How to find unused/dead code in java projects [closed]

How to find unused/dead code in java projects [closed] - java

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 3 years ago.
Improve this question
What tools do you use to find unused/dead code in large java projects? Our product has been in development for some years, and it is getting very hard to manually detect code that is no longer in use. We do however try to delete as much unused code as possible.
Suggestions for general strategies/techniques (other than specific tools) are also appreciated.
Edit: Note that we already use code coverage tools (Clover, IntelliJ), but these are of little help. Dead code still has unit tests, and shows up as covered. I guess an ideal tool would identify clusters of code which have very little other code depending on it, allowing for docues manual inspection.

An Eclipse plugin that works reasonably well is Unused Code Detector.
It processes an entire project, or a specific file and shows various unused/dead code methods, as well as suggesting visibility changes (i.e. a public method that could be protected or private).

CodePro was recently released by Google with the Eclipse project. It is free and highly effective. The plugin has a 'Find Dead Code' feature with one/many entry point(s). Works pretty well.

I would instrument the running system to keep logs of code usage, and then start inspecting code that is not used for months or years.
For example if you are interested in unused classes, all classes could be instrumented to log when instances are created. And then a small script could compare these logs against the complete list of classes to find unused classes.
Of course, if you go at the method level you should keep performance in mind. For example, the methods could only log their first use. I dont know how this is best done in Java. We have done this in Smalltalk, which is a dynamic language and thus allows for code modification at runtime. We instrument all methods with a logging call and uninstall the logging code after a method has been logged for the first time, thus after some time no more performance penalties occur. Maybe a similar thing can be done in Java with static boolean flags...

I'm suprised ProGuard hasn't been mentioned here. It's one of the most mature products around.
ProGuard is a free Java class file shrinker, optimizer, obfuscator,
and preverifier. It detects and removes unused classes, fields,
methods, and attributes. It optimizes bytecode and removes unused
instructions. It renames the remaining classes, fields, and methods
using short meaningless names. Finally, it preverifies the processed
code for Java 6 or for Java Micro Edition.
Some uses of ProGuard are:
Creating more compact code, for smaller code archives, faster transfer across networks, faster loading, and smaller memory
footprints.
Making programs and libraries harder to reverse-engineer.
Listing dead code, so it can be removed from the source code.
Retargeting and preverifying existing class files for Java 6 or higher, to take full advantage of their faster class loading.
Here example for list dead code: https://www.guardsquare.com/en/products/proguard/manual/examples#deadcode

One thing I've been known to do in Eclipse, on a single class, is change all of its methods to private and then see what complaints I get. For methods that are used, this will provoke errors, and I return them to the lowest access level I can. For methods that are unused, this will provoke warnings about unused methods, and those can then be deleted. And as a bonus, you often find some public methods that can and should be made private.
But it's very manual.

Use a test coverage tool to instrument your codebase, then run the application itself, not the tests.
Emma and Eclemma will give you nice reports of what percentage of what classes are run for any given run of the code.

We've started to use Find Bugs to help identify some of the funk in our codebase's target-rich environment for refactorings. I would also consider Structure 101 to identify spots in your codebase's architecture that are too complicated, so you know where the real swamps are.

In theory, you can't deterministically find unused code. Theres a mathematical proof of this (well, this is a special case of a more general theorem). If you're curious, look up the Halting Problem.
This can manifest itself in Java code in many ways:
Loading classes based on user input, config files, database entries, etc;
Loading external code;
Passing object trees to third party libraries;
etc.
That being said, I use IDEA IntelliJ as my IDE of choice and it has extensive analysis tools for findign dependencies between modules, unused methods, unused members, unused classes, etc. Its quite intelligent too like a private method that isn't called is tagged unused but a public method requires more extensive analysis.

In Eclipse Goto Windows > Preferences > Java > Compiler > Errors/Warnings
and change all of them to errors. Fix all the errors. This is the simplest way. The beauty is that this will allow you to clean up the code as you write.
Screenshot Eclipse Code :

IntelliJ has code analysis tools for detecting code which is unused. You should try making as many fields/methods/classes as non-public as possible and that will show up more unused methods/fields/classes
I would also try to locate duplicate code as a way of reducing code volume.
My last suggestion is try to find open source code which if used would make your code simpler.

The Structure101 slice perspective will give a list (and dependency graph) of any "orphans" or "orphan groups" of classes or packages that have no dependencies to or from the "main" cluster.

DCD is not a plugin for some IDE but can be run from ant or standalone. It looks like a static tool and it can do what PMD and FindBugs can't. I will try it.
P.S. As mentioned in a comment below, the Project lives now in GitHub.

There are tools which profile code and provide code coverage data. This lets you see (as code is run) how much of it is being called. You can get any of these tools to find out how much orphan code you have.

FindBugs is excellent for this sort of thing.
PMD (Project Mess Detector) is another tool that can be used.
However, neither can find public static methods that are unused in a workspace. If anyone knows of such a tool then please let me know.

User coverage tools, such as EMMA. But it's not static tool (i.e. it requires to actually run the application through regression testing, and through all possible error cases, which is, well, impossible :) )
Still, EMMA is very useful.

Code coverage tools, such as Emma, Cobertura, and Clover, will instrument your code and record which parts of it gets invoked by running a suite of tests. This is very useful, and should be an integral part of your development process. It will help you identify how well your test suite covers your code.
However, this is not the same as identifying real dead code. It only identifies code that is covered (or not covered) by tests. This can give you false positives (if your tests do not cover all scenarios) as well as false negatives (if your tests access code that is actually never used in a real world scenario).
I imagine the best way to really identify dead code would be to instrument your code with a coverage tool in a live running environment and to analyse code coverage over an extended period of time.
If you are runnning in a load balanced redundant environment (and if not, why not?) then I suppose it would make sense to only instrument one instance of your application and to configure your load balancer such that a random, but small, portion of your users run on your instrumented instance. If you do this over an extended period of time (to make sure that you have covered all real world usage scenarios - such seasonal variations), you should be able to see exactly which areas of your code are accessed under real world usage and which parts are really never accessed and hence dead code.
I have never personally seen this done, and do not know how the aforementioned tools can be used to instrument and analyse code that is not being invoked through a test suite - but I am sure they can be.

There is a Java project - Dead Code Detector (DCD). For source code it doesn't seem to work well, but for .jar file - it's really good. Plus you can filter by class and by method.

Netbeans here is a plugin for Netbeans dead code detector.
It would be better if it could link to and highlight the unused code. You can vote and comment here: Bug 181458 - Find unused public classes, methods, fields

Eclipse can show/highlight code that can't be reached. JUnit can show you code coverage, but you'd need some tests and have to decide if the relevant test is missing or the code is really unused.

I found Clover coverage tool which instruments code and highlights the code that is used and that is unused. Unlike Google CodePro Analytics, it also works for WebApplications (as per my experience and I may be incorrect about Google CodePro).
The only drawback that I noticed is that it does not takes Java interfaces into account.

I use Doxygen to develop a method call map to locate methods that are never called. On the graph you will find islands of method clusters without callers. This doesn't work for libraries since you need always start from some main entry point.

Related

Detect Dead code in Java on runtime

In development we use EclEmma to verify having a good test coverage. Since we start now a big refactoring task involving a lot of sourcefiles will have to be replaced, I wonder if it would be possible to run something similar on our servers to verify we haven't forgotten to remove some classes or methods.
Since the frameworks we use do a lot DI and Reflections, static code analysis won't help.
Is there something analysing on rumtime if all my classes and mothods have been called while server runtime?
Regards,
Michael
EDIT: This analysis won't run in production but in integration/acceptance test phase.
It's not planned as a finite answer to delete all recognized classes/methods, but as hint/reminder what to delete.
EDIT 2: An explanation why this question is to be closed would be nice

Try Cobertura tool, it helps to analyse better but not sure whether it will satisfy your need or not.

Decompiling a jar file and modifying the source to hack an application. How to prevent this? [duplicate]

How can I package my Java application into an executable jar that cannot be decompiled (for example , by Jadclipse)?

You can't. If the JRE can run it, an application can de-compile it.
The best you can hope for is to make it very hard to read (replace all symbols with combinations of 'l' and '1' and 'O' and '0', put in lots of useless code and so on). You'd be surprised how unreadable you can make code, even with a relatively dumb translation tool.
This is called obfuscation and, while not perfect, it's sometimes adequate.
Remember, you can't stop the determined hacker any more than the determined burglar. What you're trying to do is make things very hard for the casual attacker. When presented with the symbols O001l1ll10O, O001llll10O, OO01l1ll10O, O0Ol11ll10O and O001l1ll1OO, and code that doesn't seem to do anything useful, most people will just give up.

First you can't avoid people reverse engineering your code. The JVM bytecode has to be plain to be executed and there are several programs to reverse engineer it (same applies to .NET CLR). You can only make it more and more difficult to raise the barrier (i.e. cost) to see and understand your code.
Usual way is to obfuscate the source with some tool. Classes, methods and fields are renamed throughout the codebase, even with invalid identifiers if you choose to, making the code next to impossible to comprehend. I had good results with JODE in the past. After obfuscating use a decompiler to see what your code looks like...
Next to obfuscation you can encrypt your class files (all but a small starter class) with some method and use a custom class loader to decrypt them. Unfortunately the class loader class can't be encrypted itself, so people might figure out the decryption algorithm by reading the decompiled code of your class loader. But the window to attack your code got smaller. Again this does not prevent people from seeing your code, just makes it harder for the casual attacker.
You could also try to convert the Java application to some windows EXE which would hide the clue that it's Java at all (to some degree) or really compile into machine code, depending on your need of JVM features. (I did not try this.)

GCJ is a free tool that can compile to either bytecode or native code. Keeping in mind, that does sort of defeat the purpose of Java.

A little late I know, but the answer is no.
Even if you write in C and compile to native code, there are dissasemblers / debuggers which will allow people to step through your code. Granted - debugging optimized code without symbolic information is a pain - but it can be done, I've had to do it on occasion.
There are steps that you can take to make this harder - e.g. on windows you can call the IsDebuggerPresent API in a loop to see if somebody is debugging your process, and if yes and it is a release build - terminate the process. Of course a sufficiently determined attacker could intercept your call to IsDebuggerPresent and always return false.
There are a whole variety of techniques that have cropped up - people who want to protect something and people who are out to crack it wide open, it is a veritable arms race! Once you go down this path - you will have to constantly keep updating/upgrading your defenses, there is no stopping.

This not my practical solution but , here i think good collection or resource and tutorials for making it happen to highest level of satisfaction.
A suggestion from this website (oracle community)
(clean way), Obfuscate your code, there are many open source and free
obfuscator tools, here is a simple list of them : [Open source
obfuscators list] .
These tools make your code unreadable( though still you can decompile
it) by changing names. this is the most common way to protect your
code.
2.(Not so clean way) If you have a specific target platform (like windows) or you can have different versions for different platforms,
you can write a sophisticated part of your algorithms in a low level
language like C (which is very hard to decompile and understand) and
use it as a native library in you java application. it is not clean,
because many of us use java for it's cross-platform abilities, and
this method fades that ability.
and this one below a step by step follow :
ProtectYourJavaCode
Enjoy!
Keep your solutions added we need this more.

Understanding and modifying large projects

I am a novice programmer and as a part of my project I have to modify a open source tool (written in java) which has hundreds of classes. I have to modify a significant part of it to suit the needs of the project. I have been struggling with it for the last one month trying to read code, trying to find out the functionalities of each class and trying to figure out the pipeline from start to end.
80% of the classes have incomplete/missing documentation. The remaining 20% are those that form the general purpose API for the tool.
One month of code reading has just helped me understand the basic architecture. But I have not been able to figure out the exact changes I need to make for my project. One time, I started modifying a part of the code and soon made so many changes that I could no longer remember.
A friend suggested that I try to write down the class hierarchy. Is there a better(standard?) way to do this?

check in the code in some source code repository (Subversion, CVS, Git, Mercurial...)
make sure that you can build the project from the source and run it
if you already have an application that uses this open source tool try removing the binary dependency and introduce project dependency in eclipse or any other IDE. run your code and step through the code that you want to understand
after every small change commit
if you have different ideas branch the code

There's a great book called Working Effectively with Legacy Code, by Michael Feathers. There's a shorter article version here.
One of his points is that the best thing you can do is write unit tests for the existing code. This helps you understand where the entry points are and how the code should work. Then it lets you refactor it without worrying that you're going to break it.
From the article linked, the summary of his strategy:
1. Identify change points
2. Find an inflection point
3. Cover the inflection point
a. Break external dependencies
b. Break internal dependencies
c. Write tests
4. Make changes
5. Refactor the covered code.

Two things that Eclipse (and other IDEs as well) offer to 'fight' this. I've used them on very large projects:
Call hierarchy - right-click a method and choose "call hierarchy", or use CTRL + ALT + H. This gives you all methods that call the selected method, with option to check further down the tree. This feature is really very useful.
Type hierarchy - see the inheritance hierarchy of classes. In eclipse it's F4 or CTRL + T.
Also:
find a way to make so that changes take effect on-save, and you don't have to redeploy
use a debugger - run in debug mode, within the IDE, so that you see how the flow proceeds

My friend, you are in deep doodoo. Modifying large, badly documented legacy code is one of those projects that makes experienced programmers seriously contemplate the joys of selling insurance, or some other alternative career. However it isn't impossible, and here are some tips that I hope will help.
Your first task is to understand the code as much as possible. You are at least on the right track there. Getting a good idea of the class structure is absolutely important, and a diagram is probably the best way. The other thing I would suggest is that when you find out what a class does, add the missing documentation yourself. That way when you come back to it you wont' have forgotten what you found out.
Don't forget the debugger. If you want to find out what is really going on, stepping through the relevant code, or simply finding out what a call stack really looks like at a certain point can be very helpful.

The only way to understand code is to read it. Keep working that is my advice.
There are projects with better documentation than others. Here is a couple of projects that I know are well organized:
Tomcat ,
Jetty,
Hudson,
You should check java-source for more open source projects.

Personally I think it is very difficult to try to understand an entire application all at once. Instead, try to focus only on certain modules. For example, if you can identify a module that you need to change (e.g. based on a screen, or certain input/output point), then start by making one small change and testing it. Go from there, making a small change, testing, and moving on.
Additionally, if your project has unit tests (consider yourself lucky) and review the unit tests of the module you are focusing on. That will help you get an idea of what the module is expected to do.

In my opinion there is no standard approach to understand a project. It depends on many factors, from the understandability of the code/architecture you're analyzing to your previous experience on large projects.
I suggest you to reverse-engineer the code by using a modeling tool, so that you can generate some UML models from the existing source code. These diagrams can be helpful as a graphic guideline during your anaysis of the code.
Don't be afraid to use debugging to grab the logic of the most complex functionalities of the project. Running the most complex code instruction by instruction, seeing the exact values of the variables and the interactions between the objects can be helpful.
Before you refactor to change the project to suit your needs, be sure to write some test cases, so that you can verify that your modifications don't break the code in unexpected ways.

Here are a couple recommendations
Get the code into some form of CVS.
This way if you start making changes
you can always look back at previous
versions.
Take the time to document what you
have already learned/gone through. Javadoc is fine
for this.
Create a UML structure for you code.
There are lots of plugins out there and wil give you a nice representation of your code layout.

Important things to keep it mind before a Code Review in Java

I have just created a mid-sized web-application using Java, a custom MVC framework, javascript. My code will be reviewed before it's put in the productions servers (internal use).
The primary objective of building this app was to solve a small problem for internal use and understand the custom made MVC framework used by my employer. So, my app has gone through MANY iterations, feature changes and additions.
So, bottom line, the code is very very dirty and this is my first "product level" Java app.
What are your suggestions, what are some basic checks/refractoring I should do before the code review?
I am thinking about:
Java best practices (conventions)
Make the code simple to understand for the developer who will maintain it. (won't be me)
I noticed, I have created some unnecessary objects and used hashmaps/arraylists where could have easily used some other Data structure and achieved the solution. So, is that worth changing?
Update
Your Code Sucks and I Hate You: The Social Dynamics of Code Reviews

If you did not already, (assuming you use an IDE like eclipse)
get plugins checkstyle and findbugs
go through their configuration and tune to your style
run them on your code
resolve all issues reported
you can also tune the compiler warning setting of eclipse itself and possibly make them more strict in what is reported.
Look at code structure:
get plugin jdepend
investigate your package structure
Code against interfaces (Map, List, Set) instead of implementation classes (HashMap, ArrayList, TreeSet)
Complete your Javadoc and make check it is up to date after all refactorings.
Add JUnit tests; if you have no time left to test the whole application, at least create a test for every bug you find and solve from now on. This helps "growing" a test set as you go.
Next time design and build your application with the end goal in sight. Always assume that the next guy having to maintain your code will know how to find you :-)

Unit tests, and they should be automated as part of your build. You should already have these, but if not, do it now. It will definitely make the refactoring easier, as well improving your general confidence in the code (and the guy who will be maintaining it).

Logging.
One of the more overlooked things is the importance of logging. You need to have a decent logging methodology put in place. Even though this is an internal app, make sure that the basic logs can help regular users find issues and provide more detailed logging so that you (the developer) would know where to go.

Comment your code, explain why it's doing what it's doing and what assumptions have been made.
Try to reduce the amount of mutating state.
Try to remove any singletons you may have.

How does one weed out dependencies in a large project?

I'm about to inherit a rather large Java enterprise project that has a large amount of third party dependencies. There is at least seventy JARs included and some of them would seem to be unused e.g. spring.jar which I know isn't used.
It seems that over the years as various developers have touched upon the code base they have all tried out new project-of-the-month type libraries.
How does one go about getting rid of these? Within reason of course, as clearly some dependencies are helpful to not have to re-invent the wheel.
I'm obviously interested in java based projects but I'm welcome to answers across languages that people think will be helpful.

Personally, I think you have to start by assessing the scale of the problem. It's going to be fairly painful, but I'd make a list of the dependencies and work out exactly which parts of the project use which ones.
Then I'd work out exactly what features of each you're actually making use of (in many cases, you'll end up having a massive third party library which you're using a tiny part of).
Once you have this information, you'll at least know what you're dealing with.
My next step would be to look at all of the dependencies that you only use to a small extent. Checking around might uncover things that you could use from other libraries that would eliminate the lesser used libraries.
I'd also have a look around to see if there's anything small that you could just re-write and include in your own code-base.
Finally, I'd have a look around at the vendors of your dependencies and their competitors to see if the latest versions contain more functionality that will allow you to eliminate a few others.
Then you're just left wondering whether it's better to be highly dependent on a few vendors, or less dependent on a lot of vendors!! ;o)

structure101 http://www.headwaysoftware.com/products/structure101/index.php
It's a great tool for showing dependencies. I've been using it for a couple of years.

If you have a good set of automated tests, and you're looking to remove libraries which are not used at all, you could just use trial and error. One at a time, remove a library, and run your tests to see if everything still works. If not, put it back. Of course, if you can't even build without a library, you probably need it.
Basically, however you go about it, my idea is to remove them one at a time and see what breaks. If nothing breaks, odds are good you can just toss the library. If the problem is very minor (e.g. you need one method of one class in a large library), you might be able to code around it.
If you're dealing with a standalone application, you could give the JVM the -verbose:class option to see which classes are being loaded. This should give you messages like:
[Opened C:\Program Files\Java\jre1.6.0_04\lib\rt.jar]
[Loaded java.util.regex.Pattern$Single from C:\Program Files\Java\jre1.6.0_04\lib\rt.jar]

I read about an approach using instrumentation here, never tried it, but sounds reasonable.

We went through an exercise like this, on a delphi codebase. We dramatically simplified our external dependancies. Basically, we went about it like this:
Catalogued all external libraries and components
Catalogued (using a file search tool) where they were used, and what for.
Removed everything we didn't use or didn't need (some libraries were used in code that was no longer needed).
Made a ranking of which libraries we favored, basing this on whether the library was actively developed, how much functionality it offered that we used, how difficult it was to port the code that used it to another library that we already used and so on.
Finally, we iteratively removed dependancies on libraries low on the list by porting that functionality to another library.
This was, however, quite a lot of work.

If you take the approach of "remove things until it won't compile" you need to be very careful about transitive runtime dependencies. If there's a good quality test suite, it can help, but you'll certainly need to run a test coverage tool like Cobertura to make sure that enough of the code is getting tested to exercise your full dependency graph.
How much code are you talking about? The review-based approach suggested by Joeri frankly seems the best to me; it has the added advantage of making you at least superficially familiar with all parts of the system. If you're just inheriting a big project, this is something you should probably take the time to do anyway.

if you have a full regression test suite for this project, all you have to do is run the regression suite while running with 1 less JAR each time in a loop. it is NOT fast BUT it is easy to do.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.