Tool for finding package namespace conflicts in java code

Tool for finding package namespace conflicts in java code - java

We have a number of projects that use the same and/or similar package names. Many or these projects will build jar files that are used by other projects. We have found a number of foo.util foo.db and foo.exceptions where the same class names are being used leading to name space conflicts.
Does anyone know of a tool that will search a set of java code bases and automatically find name space conflicts and ambiguous imports?

It's simpler to fix your names in each individual project.
Really.
You don't need to know all the conflicts. Your package names should be unique in the first place. If they aren't unique, you need to rethink how you're assigning your package names. If they're "flat" (foo.this and foo.that) you need to make them taller and much more specific.
That's why the examples are always org.apache.project.component.lower.level.names.
You should have com.projectX.foo.this and com.projectZ.foo.that to prevent the possibility of duplication.
"But all that recompiling," you say. You'll have to do that anyway. Don't waste a lot of time trying to discover the exact, complete extent. Go with what you know, start fixing things now, and work your way through your code base fixing one thing at a time.

If you can load the projects into Eclipse, the Problems view will give you the conflicts and ambiguous imports. There is also a organize imports wizard that will help with any unnecessary imports.

I agree with S.Lott - you need to find out how this happened in the first place, and make it very painful for the people who did it wrong (i.e. make them go back and fix things). Modern IDEs (Eclipse being one of them) makes this kind of refactoring fairly easy.
The only thing I disagree with is about fixing things as you go. Instead, I strongly recommend that you pay your technical debt on this one now with a solid package name review. A couple of hours (and that's really all it should take) here will save you nightmare debugging down the road.
FYI - it is perfectly legal to have classes in the same package in different source folders (e.g.):
src/com/my/project
test/com/my/project
Trying to differentiate between the above valid folder structure, and the situation you are winding up with is going to be very tough in any automated manner.

Related

Java package structure convention

I was working with Typescript and Javascript and I stopped for a bit thinking about namespaces and how we organize code in Java.
Now, pretty often I see multiple classes with the same purpose, just for use with different data, being placed in different packages with different names.
While placing those classes in different packages is a good practice to maintain the project/module in a good state, why would we have to give them different names? I mean, their purpose is the same.
I can answer this question myself: because often we use those classes inside the same unit and they would clash, thus requiring a long full package specification for one or the other.
But aren't we violating the DRY principle? We should use our directory (package) structure to understand in which domain space those classes works.
As I wrote above, I suppose many developers aren't respecting this DRY principle just to avoid long package names. Then, why are we creating those monstruos package hierarchies?
Googling "Java packages best practices" results in suggestions such as:
com.mycompany.myproduct.[...]
Where do those suggestions come from? Aren't we just wasting space?
We obviously do not want to write that every time.
MyClass myInstance;
com.mycompany.myproduct.mypackage.MyClass myOtherInstance;
But it could have been
myfeature.MyClass myInstance
myotherfeature.MyClass myInstance;
We could even specify the full package for both.
So, where do those best practices come from?
As it has been said, this convention dates back to the first releases of Java.
Name clashes could be easily solved by qualifying the imported dependency (such as a classpath library) classes with their short packages' hierarchy, emphasizing and keeping cleaner our own code. It is also important to remember we have access to the package-level visibility, which seems overlooked nowdays.

As a commenter points out, this convention was established at the very beginning of Java, to allow a global namespace to be used for all classes.
One thing about Java that influences this -- and is different from TypeScript and JavaScript -- the name of a class and its package (as well as all the names of classes and packages it uses) is fixed at compile time. You can't recombine classes into a different hierarchy without modifying their source code. So it's not as flexible as an interpreted language where you can just move the hierarchy around and change the locations from which you load code.
You can, of course, put everything in your application in the top-level package, for example, and be pretty confident you'll get away with it. You'll be able to do that as long as everyone else follows the existing convention. Once they stop doing that, and libraries start putting classes wherever they want, you'll have collisions, and you'll have to change your application. Or libraries will start colliding with each other, and if you don't have the source, you won't really be able to fix it.
So there are good reasons.
My opinion-based part of this -- sticking with conventions and a language's (or platform's) culture makes sense. Even if the conventions are silly, in our view. There's a place for breaking them but the advantages have to be pretty big. Other developers will be used to the convention, tools will make assumptions about the conventions ... usually it makes sense to go with the flow.
Does it really make sense to use getters and setters for every property in Java? If you understand the domain you're using well, it very well might not. But stop doing it and your code isn't going to make sense to other members of your team, you'll have to continuously revisit the decision as people join, and so forth.
If it's a one-person project and always will be, of course, you can do what you want.

Naming convention for homonymous classes in different libraries

In any project with relatively big number of dependencies there are always a lot of commongly named classes in different libraries. For example, Configuration is very widely used:
It slows down the programmer as he has to carefully pick the right class from the list. It is also very irritating if you have to use different configurations in one class, so they have to be prepended with full package name.
I'm writing a library which also needs a Configuration class. Should I use this name? Or is it better to name it {Libname}Configuration? Is their any common way to avoid such problems?

I think you should name your class in the way that is most clear of it's usage and you shouldn't care too much if another classes with the same name exist. As you know, the usage of packages reduces the risk of naming collisions.
For any project I think it's good that the whole team should have a convention related to basic naming and formatting used. It's important to use a consistent naming convention so that when new people work on it, they pick it up faster. I think that conventions also help increase productivity since it's easier to remember names.
I think it's good to spend some time thinking about classes, not only in terms of algorithms but also as what business part they fill. To think about why is a class necessary and what brings to the project, can make you more aware of the way your method/class/variable works within the application workflow.
That being said, I think that maybe your IDE has some option to hide some of the classes is shows. I'm using IntelliJ and it has a feature for this situation, even though it's a bit hidden.

I think it is usually not a very good idea to start with library name for a class, simply because in the long run that will make it more difficult to remember, and because it diminishes readability.
There are ways to setup your IDE (depending on which IDE you use) so that autocomplete shows the most used classes first. You can also get to a class quickly by first typing the name, then the library, when using autocomplete. These are all dependent on you IDE. But generally it seems like a bad practice to start a classname with the name of the library.

Tool to determine transitive Java imports?

Is there some sort of tool that you can point at a set of Java classes, and it produces output showing the transitive imports of each class?
I understand that imports are not "transitive" from the point of view of the language itself - i.e. if com.acme.X imports com.acme.Y, and com.acme.Y imports com.acme.Z, that does not mean that you can refer to com.acme.Z within com.acme.X. But that's not what I mean:
Rather, I mean that com.acme.X nonetheless depends upon com.acme.Z (at least under the current implementations of X and Y), and I want to know that fact. In fact I want to know it for a large number of classes, and so I'm hoping that there's a tool do determine it automatically.
Either a standalone tool or an Eclipse plugin or feature would be great.
Thanks in advance.
EDIT to hopefully show what I want this for:
I have a huge monolithic jar that contains many features that are (essentially) completely unrelated. I'd like to break it apart into several smaller, more manageable, and more self-coherent jars.
Unfortunately, I can't do it simply by breaking it up based on packages, because many of the packages themselves are not self-coherent either. That is, for example, there's a "com.acme.utils" package. Two things in that package are probably have nothing in common except for the fact that they're both, in some sense, "utilities". One may be a utility for some particular business function, another may be TCP/IP utilities, another may be a set of string utilities, another may be some completely unrelated business function.
And there are a bunch of packages like this. So when you look at the transitivity of imports from the point of view of packages, they snowball without limit, and so more or less everything in the monolithic jar depends on everything else in the monolithic jar.
So I'd like to start by considering transitivity of imports from the class point of view, rather than the package point of view. That way I should be able to more easily determine what classes need to be reorganized from what existing packages into new, more coherent packages, and then after that I can break the monolithic jar apart by packages / sets of packages.

we're using sonar for software metrics. http://www.sonarsource.org/

Too many imports are spamming my Java code

In my project I have a shapes package which has shapes I designed for my graphics program, e.g., Rectangle and Circle. I also have one or two more packages that have the same names as java.awt classes.
Now, since I do not want to rename every class in my codebase, to show my source files which class I mean when I, say, declare a new Rectangle, I need to either:
1- import the rectangle class explicitly, i.e., import shapes.Rectangle
or
2- import only the java.awt classes I need and not import java.awt.* which automatically includes the awt.Rectangle
Now the problem is that both ways result in a lot of importing. I currently have an average of 15-25 imports in each source file, which is seriously making my code mixed-up and confusing.
Is too many imports in your code a bad thing? Is there a way around this?

Yes, too many imports is a bad thing because it clutters your code and makes your imports less readable.
Avoid long import lists by using wildcards.
Kevlin Henney talks about this exact Stack Overflow question 27:54 into his presentation Clean Coders Hate What Happens to Your Code When You Use These Enterprise Programming Tricks from NDC London 16-20 Jan 2017

If you use glob imports, it's possible to break your code with a namespace clash just by updating a dependency that introduces new types (typically not expected to be a breaking change). It could be a pain to fix in a large codebase that was liberal with their use of glob imports. That is the strongest reason I can think of for why it's a good idea to specify dependencies explicitly.
It's easier to read code which has each of the imports specified,
because you can see where types are coming from without requiring IDE
specific features and mouse hovering, or going through large pages of library documentation. Many people read a lot of code outside the IDE in code review, diffs, git history, etc.

Another alternative is to type the fully qualified class name as you need it. In my example, there are two Element objects, one created by my org.opensearch.Element and the other org.w3c.dom.Element.
To resolve the name conflict, as well as to minimize import "clutters", I've done this (in my org.opensearch.Element class):
public org.w3c.dom.Element toElement(org.w3c.dom.Document doc) { /* .... */ }
As you can see, the return Element type is fully-typed (i.e., I've specified the fully-qualified class name of Element).
Problem solved! :-)

I use explicit imports, and have done so for many years. In all my projects in the last decade this has been agreed with team members, and all are happy to agree to to use explicit imports, and avoid wildcards. It has not been controversial.
In favour of explicit imports:
precise definition of what classes are used
less fragile as other parts of the codebase changes
easier to code review
no guessing about which class is in which package
In favour of wildcards:
less code
easier to add and maintain the imports when using a text editor
Early in my career, I did use wildcard imports, because back then IDEs were not so sophisticated, or some of us just used text editors. Managing explicit imports manually is quite a bit of effort, so wildcard imports really help.
However at least once I was bitten by the use of wildcard imports, and this lead be to my current policy of explicit only.
Imagine you have a class that has 10 wildcard imports, and it compiles successfully. Then you upgrade 5 jar files to newer versions (you have to upgrade them all, because they are related). Then the code no longer compiles, there is a class not found error. Now which package was that class in? What is the full name of the class? I don't know because I'm only using the short name, now I have to diff the old and new versions of the jars, to see what has changed.
If I had used explicit imports, it would be clear which class had been dropped, and what it's package was, and this which jar (by looking of other classed in that package) is responsible.
There are also problems reading code history, or looking at historic merges. When you have wildcard imports there is uncertainty for the reader about which class is which, and thus what the semantics of the code changes are. With explicit imports you have a precise description of the code, and it acts as a better historical record.
So overall the benefit of the small amount of extra effort to maintain the import, and extra lines of code are easily outweighed by extra precision and determinism given by explicit imports.
The only case when I still use wildcards, is when there are more that 50 imports from the same package. This is rare, and usually just for constants.
Update1: To address the comment of Peter Mortensen, below...
The only defence Kevlin Henney makes in his talk for using wildcards is that the name collision problem doesn't happen very often. But it's happened to me, and I've learnt from it. And I discuss that above.
He doesn't cover all the points I've made in my answer above. -- But most importanly, I think the choice you make, explicit or wildcard, doesn't matter that much, what matters is that everyone on the project/codebase agree and use a consistent style. Kevlin Henney goes on to talk about cargo-cult programming. My decisions as stated above are based on personal lessons over decades, not cargo cult reasoning.
If I was to join a project where the existing style was to use wildcards, I'd be fine with it. But if I was starting a new project I'd use precise imports.
Interestingly in nodejs there is no wildcard option. (Although you do have 'default' imports, but it's not quite the same).

It's subjective and depends greatly on the circumstances. I sometimes bounce between the two.
It is a general good practice to be specific by default but it can also be higher maintenance. It's not a perfect rule. However being more specific (higher initial cost) will tend to reveal itself earlier through measurable or perceptible drag where as being lazy by default tends to manifest as a problem more adversely.
Over including of entire namespaces can create bloat and clashes as well as hide changes in structure but in certain cases it may outweigh the benefit.
To give a simple case:
I use a package with a hundred classes.
What if I use one of its classes?
What if I use all but one of them?
It's similar to the whitelist versus blacklist problem.
In an ideal situation the package hierarchy will subdivide package types enough to establish a good balance.
In certain systems I've seen people do the equivalent of import *. Perhaps the most horrifying case is when I saw someone do something similar to apt install *. Though it seemed clever so as to never have to worry about missing dependencies it imposed enormous burdens that far outweighed any benefit.
All things can be taken to the extreme. For example I could argue for the utility of imports as close to as needed but if we're going to do that why not just always use the fully qualified names all the time?
Problems like this are annoying as the low cognitive load of consistency is always preferable but when it comes down to it in various given circumstances you may need to play it by ear.
It's important to be proportionate. Doing something a thousand times to avoid something that happens one time, self presents and takes about as much effort to fix tends to result in a waste.
Different objectives may make "too many" imports a good thing. I have a custom JavaScript framework which would likely horrify many at a glance for its stacks of imports.
What's not immediately obvious is that this is part of a deliberate strategy.
This allows it to be easy In being able to more cleanly package the code with all its specific dependencies and then transmit that over a network.
Imports have a plug nature at build time to alternate dependencies for each given platform target.
This tends not to be as much as a problem for languages that are less dynamic and that do not suffer greatly (or at all) from the overhead excessively importing namespaces. The moral of this story is that import strategies can vary enormously but be valid for their given circumstances. You can only go so far in taking general approaches.
In each situation you will need to have your bearings and a sense of the lay of the land. If the import conventions and structure is causing a nuisance then it's necessary to narrow down the how and why. Too many imports may not be the result of a specific strategy but things such as packing too much into a single file. At the same time some files are naturally large and naturally require many imports.
Badly applied separation of concerns or organisation in failing to keep related things together can create a graph with grossly excessive edges where that can be reduced with greater organisation. To some degree it's not abnormal for code to be clustered by specific dependencies more so specific dependencies than the more general ones.
If a code base is well organised into a graph that is fairly close to optimal with neither excessive splitting, merging and minimal distance between things you will tend to find that even if being specific with imports the majority of cases will tend to stay within a reasonable size.

I don't get all the non-answers posted here.
Yes, individual imports are a bad idea, and add little, if anything, of value.
Instead just explicitly import the conflicts with the class you want to use (the compiler will tell you about conflicts between the awt and shapes package) like this:
import java.awt.*;
import shapes.*;
import shapes.Rectangle; // New Rectangle, and Rectangle.xxx will use shapes.Rectangle.
This has been done for years, since Java 1.2, with awt and util List classes. If you occasionally want to use java.awt.Rectangle, well, use the full class name, e.g., new java.awt.Rectangle(...);.

It's normal in Java world to have a lot of imports - you really need to import everything. But if you use an IDE, such as Eclipse, it does the imports for you.

It's a good practice to import class by class instead of importing whole packages
Any good IDE, such as Eclipse, will collapse the imports in one line, and you can expand them when needed, so they won't clutter your view
In case of conflicts, you can always refer to fully qualified classes, but if one of the two classes is under your control, you can consider renaming it (with Eclipse, right click on the class, choose menu Refactor → Rename, it will take care to update all its references).
If your class is importing from AWT and from your package of shapes, is ok. It's ok to import from several classes; however, if you find yourself importing from really lots of disparate sources, it could be a sign that your class is doing too much, and need to be split up.

What do you name a Java package when it isn't a part of a top level domain?

I've read the syntax conventions for naming Java packages, and I know the general rule of thumb, but what if you've just started building your application, you haven't chosen a license, and it is a personal project? It doesn't make sense to throw in "com.mycompany" or "org.myorganization" if that is not the case. Does anyone have suggestions for this?

Many Java books and online examples just use the name of the book or project, i.e., ejb3inaction.* or tutorial.*.

I usually just go with something like lastname.firstname.<other packages>. The package name should just be unique, and the combination of your last name and first name is probably unique enough (if you have a common name, throw in a middle initial or a middle name or something like that).

Perhaps org.{myname} ?
I don't think it particularly matters for personal projects. In fact I've seen commercial (in-house) projects flout this rule and simply call their packages {servicename} or similar (which I don't particularly like). The packaging rules are designed to prevent name clahes when sharing code cross-enterprise (or organisation) and consequently for personal projects you can use most anything.

Java packages are just for namespacing. Call it whatever you want! Think of something that will be useful to you later on when you want to remember what this code was supposed to do.

how about org.{projectname}? Think about some open-source projects (e.g. org.hibernate, org.springframework and org.junit) ... did they start this way because it was the name of the website, or the name of the project itself?
Besides, re-factoring is so trivial these days, just name it whatever you want.

What about name.yourname.myproject or net.sf.myproject or com.googlecode.myproject or simply myproject.
As long as you don't make your code public, it's not that important actually (and you can easily refactor it later before releasing your code if you need to). Once people start using your code, it's another story...

In the case you might also think about sharing the project (making it an open-source project), to open an project entry, in that case you can use something like 'net.sf.{project-name}.*', be careful here, that the project-name, must be the unix name, of the project (at least then you follow the rules correctly :)
SourceForge
Java Net
Google Code
Launchpad
JavaForge
Tigris.org

I generally use nl.myname.myapp for all personal projects. There is no rule against open sourcing something that uses your personal name. If you decide to make the project bigger and create a web site for it you can always rename the packages if you really want to.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Tool for finding package namespace conflicts in java code - java

If you can load the projects into Eclipse, the Problems view will give you the conflicts and ambiguous imports. There is also a organize imports wizard that will help with any unnecessary imports.

Related

Java package structure convention

Naming convention for homonymous classes in different libraries

Tool to determine transitive Java imports?

Too many imports are spamming my Java code

What do you name a Java package when it isn't a part of a top level domain?

Categories

Resources