Tool to determine transitive Java imports? - java

Is there some sort of tool that you can point at a set of Java classes, and it produces output showing the transitive imports of each class?
I understand that imports are not "transitive" from the point of view of the language itself - i.e. if com.acme.X imports com.acme.Y, and com.acme.Y imports com.acme.Z, that does not mean that you can refer to com.acme.Z within com.acme.X. But that's not what I mean:
Rather, I mean that com.acme.X nonetheless depends upon com.acme.Z (at least under the current implementations of X and Y), and I want to know that fact. In fact I want to know it for a large number of classes, and so I'm hoping that there's a tool do determine it automatically.
Either a standalone tool or an Eclipse plugin or feature would be great.
Thanks in advance.
EDIT to hopefully show what I want this for:
I have a huge monolithic jar that contains many features that are (essentially) completely unrelated. I'd like to break it apart into several smaller, more manageable, and more self-coherent jars.
Unfortunately, I can't do it simply by breaking it up based on packages, because many of the packages themselves are not self-coherent either. That is, for example, there's a "com.acme.utils" package. Two things in that package are probably have nothing in common except for the fact that they're both, in some sense, "utilities". One may be a utility for some particular business function, another may be TCP/IP utilities, another may be a set of string utilities, another may be some completely unrelated business function.
And there are a bunch of packages like this. So when you look at the transitivity of imports from the point of view of packages, they snowball without limit, and so more or less everything in the monolithic jar depends on everything else in the monolithic jar.
So I'd like to start by considering transitivity of imports from the class point of view, rather than the package point of view. That way I should be able to more easily determine what classes need to be reorganized from what existing packages into new, more coherent packages, and then after that I can break the monolithic jar apart by packages / sets of packages.

we're using sonar for software metrics. http://www.sonarsource.org/

Related

Java package structure convention

I was working with Typescript and Javascript and I stopped for a bit thinking about namespaces and how we organize code in Java.
Now, pretty often I see multiple classes with the same purpose, just for use with different data, being placed in different packages with different names.
While placing those classes in different packages is a good practice to maintain the project/module in a good state, why would we have to give them different names? I mean, their purpose is the same.
I can answer this question myself: because often we use those classes inside the same unit and they would clash, thus requiring a long full package specification for one or the other.
But aren't we violating the DRY principle? We should use our directory (package) structure to understand in which domain space those classes works.
As I wrote above, I suppose many developers aren't respecting this DRY principle just to avoid long package names. Then, why are we creating those monstruos package hierarchies?
Googling "Java packages best practices" results in suggestions such as:
com.mycompany.myproduct.[...]
Where do those suggestions come from? Aren't we just wasting space?
We obviously do not want to write that every time.
MyClass myInstance;
com.mycompany.myproduct.mypackage.MyClass myOtherInstance;
But it could have been
myfeature.MyClass myInstance
myotherfeature.MyClass myInstance;
We could even specify the full package for both.
So, where do those best practices come from?
As it has been said, this convention dates back to the first releases of Java.
Name clashes could be easily solved by qualifying the imported dependency (such as a classpath library) classes with their short packages' hierarchy, emphasizing and keeping cleaner our own code. It is also important to remember we have access to the package-level visibility, which seems overlooked nowdays.
As a commenter points out, this convention was established at the very beginning of Java, to allow a global namespace to be used for all classes.
One thing about Java that influences this -- and is different from TypeScript and JavaScript -- the name of a class and its package (as well as all the names of classes and packages it uses) is fixed at compile time. You can't recombine classes into a different hierarchy without modifying their source code. So it's not as flexible as an interpreted language where you can just move the hierarchy around and change the locations from which you load code.
You can, of course, put everything in your application in the top-level package, for example, and be pretty confident you'll get away with it. You'll be able to do that as long as everyone else follows the existing convention. Once they stop doing that, and libraries start putting classes wherever they want, you'll have collisions, and you'll have to change your application. Or libraries will start colliding with each other, and if you don't have the source, you won't really be able to fix it.
So there are good reasons.
My opinion-based part of this -- sticking with conventions and a language's (or platform's) culture makes sense. Even if the conventions are silly, in our view. There's a place for breaking them but the advantages have to be pretty big. Other developers will be used to the convention, tools will make assumptions about the conventions ... usually it makes sense to go with the flow.
Does it really make sense to use getters and setters for every property in Java? If you understand the domain you're using well, it very well might not. But stop doing it and your code isn't going to make sense to other members of your team, you'll have to continuously revisit the decision as people join, and so forth.
If it's a one-person project and always will be, of course, you can do what you want.

Use of modules within Java programming

Hopefully this is a question that only needs a fairly quick answer, but I haven't had much luck finding something online that is in terms I understand!
Quite simply, I'm working on my first real project in Java, a text adventure, (using IntelliJ IDEA) and I was just wondering if I need to be splitting my code into modules? So, for my monsters, should I keep all of my monster classes within a module called Monsters, or can I just keep it in the same module?
I only ask because; a) I wasn't sure whether it was a done thing in order to keep the project tidy and b) When I tried to create a Monster module, I received a warning telling me that the files in this module wouldn't be accessible from the rest of the program, which seems to defeat the object to me...
Many thanks in advance for any advice!
I believe you are referring to IntelliJ's concept of a module. As stated on their page:
A module is a discrete unit of functionality which you can compile, run, test and debug
independently.
Modules contain everything that is required for their specific tasks:
source code, build scripts, unit tests, deployment descriptors, and
documentation. However, modules exist and are functional only in the
context of a project.
So, modules should not be referencing the source code from other modules. They should essentially be completely different units.
As in thecbuilder's answer, you should look into using Java's packaging system instead.
By modules if you mean packages, then its a good habit to keep related classes in one package and distributing unrelated classes in different packages.
And to the thing, that the classes wouldn't be accessible, you'll have to make them public to access them from different packages.
More on package structuring :
http://www.javapractices.com/topic/TopicAction.do?Id=205
http://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html
https://stackoverflow.com/a/3226371/3603806
For access specifiers :
Taken from : http://www.go4expert.com/articles/java-access-specifiers-t28019/

Too many imports are spamming my Java code

In my project I have a shapes package which has shapes I designed for my graphics program, e.g., Rectangle and Circle. I also have one or two more packages that have the same names as java.awt classes.
Now, since I do not want to rename every class in my codebase, to show my source files which class I mean when I, say, declare a new Rectangle, I need to either:
1- import the rectangle class explicitly, i.e., import shapes.Rectangle
or
2- import only the java.awt classes I need and not import java.awt.* which automatically includes the awt.Rectangle
Now the problem is that both ways result in a lot of importing. I currently have an average of 15-25 imports in each source file, which is seriously making my code mixed-up and confusing.
Is too many imports in your code a bad thing? Is there a way around this?
Yes, too many imports is a bad thing because it clutters your code and makes your imports less readable.
Avoid long import lists by using wildcards.
Kevlin Henney talks about this exact Stack Overflow question 27:54 into his presentation Clean Coders Hate What Happens to Your Code When You Use These Enterprise Programming Tricks from NDC London 16-20 Jan 2017
If you use glob imports, it's possible to break your code with a namespace clash just by updating a dependency that introduces new types (typically not expected to be a breaking change). It could be a pain to fix in a large codebase that was liberal with their use of glob imports. That is the strongest reason I can think of for why it's a good idea to specify dependencies explicitly.
It's easier to read code which has each of the imports specified,
because you can see where types are coming from without requiring IDE
specific features and mouse hovering, or going through large pages of library documentation. Many people read a lot of code outside the IDE in code review, diffs, git history, etc.
Another alternative is to type the fully qualified class name as you need it. In my example, there are two Element objects, one created by my org.opensearch.Element and the other org.w3c.dom.Element.
To resolve the name conflict, as well as to minimize import "clutters", I've done this (in my org.opensearch.Element class):
public org.w3c.dom.Element toElement(org.w3c.dom.Document doc) { /* .... */ }
As you can see, the return Element type is fully-typed (i.e., I've specified the fully-qualified class name of Element).
Problem solved! :-)
I use explicit imports, and have done so for many years. In all my projects in the last decade this has been agreed with team members, and all are happy to agree to to use explicit imports, and avoid wildcards. It has not been controversial.
In favour of explicit imports:
precise definition of what classes are used
less fragile as other parts of the codebase changes
easier to code review
no guessing about which class is in which package
In favour of wildcards:
less code
easier to add and maintain the imports when using a text editor
Early in my career, I did use wildcard imports, because back then IDEs were not so sophisticated, or some of us just used text editors. Managing explicit imports manually is quite a bit of effort, so wildcard imports really help.
However at least once I was bitten by the use of wildcard imports, and this lead be to my current policy of explicit only.
Imagine you have a class that has 10 wildcard imports, and it compiles successfully. Then you upgrade 5 jar files to newer versions (you have to upgrade them all, because they are related). Then the code no longer compiles, there is a class not found error. Now which package was that class in? What is the full name of the class? I don't know because I'm only using the short name, now I have to diff the old and new versions of the jars, to see what has changed.
If I had used explicit imports, it would be clear which class had been dropped, and what it's package was, and this which jar (by looking of other classed in that package) is responsible.
There are also problems reading code history, or looking at historic merges. When you have wildcard imports there is uncertainty for the reader about which class is which, and thus what the semantics of the code changes are. With explicit imports you have a precise description of the code, and it acts as a better historical record.
So overall the benefit of the small amount of extra effort to maintain the import, and extra lines of code are easily outweighed by extra precision and determinism given by explicit imports.
The only case when I still use wildcards, is when there are more that 50 imports from the same package. This is rare, and usually just for constants.
Update1: To address the comment of Peter Mortensen, below...
The only defence Kevlin Henney makes in his talk for using wildcards is that the name collision problem doesn't happen very often. But it's happened to me, and I've learnt from it. And I discuss that above.
He doesn't cover all the points I've made in my answer above. -- But most importanly, I think the choice you make, explicit or wildcard, doesn't matter that much, what matters is that everyone on the project/codebase agree and use a consistent style. Kevlin Henney goes on to talk about cargo-cult programming. My decisions as stated above are based on personal lessons over decades, not cargo cult reasoning.
If I was to join a project where the existing style was to use wildcards, I'd be fine with it. But if I was starting a new project I'd use precise imports.
Interestingly in nodejs there is no wildcard option. (Although you do have 'default' imports, but it's not quite the same).
It's subjective and depends greatly on the circumstances. I sometimes bounce between the two.
It is a general good practice to be specific by default but it can also be higher maintenance. It's not a perfect rule. However being more specific (higher initial cost) will tend to reveal itself earlier through measurable or perceptible drag where as being lazy by default tends to manifest as a problem more adversely.
Over including of entire namespaces can create bloat and clashes as well as hide changes in structure but in certain cases it may outweigh the benefit.
To give a simple case:
I use a package with a hundred classes.
What if I use one of its classes?
What if I use all but one of them?
It's similar to the whitelist versus blacklist problem.
In an ideal situation the package hierarchy will subdivide package types enough to establish a good balance.
In certain systems I've seen people do the equivalent of import *. Perhaps the most horrifying case is when I saw someone do something similar to apt install *. Though it seemed clever so as to never have to worry about missing dependencies it imposed enormous burdens that far outweighed any benefit.
All things can be taken to the extreme. For example I could argue for the utility of imports as close to as needed but if we're going to do that why not just always use the fully qualified names all the time?
Problems like this are annoying as the low cognitive load of consistency is always preferable but when it comes down to it in various given circumstances you may need to play it by ear.
It's important to be proportionate. Doing something a thousand times to avoid something that happens one time, self presents and takes about as much effort to fix tends to result in a waste.
Different objectives may make "too many" imports a good thing. I have a custom JavaScript framework which would likely horrify many at a glance for its stacks of imports.
What's not immediately obvious is that this is part of a deliberate strategy.
This allows it to be easy In being able to more cleanly package the code with all its specific dependencies and then transmit that over a network.
Imports have a plug nature at build time to alternate dependencies for each given platform target.
This tends not to be as much as a problem for languages that are less dynamic and that do not suffer greatly (or at all) from the overhead excessively importing namespaces. The moral of this story is that import strategies can vary enormously but be valid for their given circumstances. You can only go so far in taking general approaches.
In each situation you will need to have your bearings and a sense of the lay of the land. If the import conventions and structure is causing a nuisance then it's necessary to narrow down the how and why. Too many imports may not be the result of a specific strategy but things such as packing too much into a single file. At the same time some files are naturally large and naturally require many imports.
Badly applied separation of concerns or organisation in failing to keep related things together can create a graph with grossly excessive edges where that can be reduced with greater organisation. To some degree it's not abnormal for code to be clustered by specific dependencies more so specific dependencies than the more general ones.
If a code base is well organised into a graph that is fairly close to optimal with neither excessive splitting, merging and minimal distance between things you will tend to find that even if being specific with imports the majority of cases will tend to stay within a reasonable size.
I don't get all the non-answers posted here.
Yes, individual imports are a bad idea, and add little, if anything, of value.
Instead just explicitly import the conflicts with the class you want to use (the compiler will tell you about conflicts between the awt and shapes package) like this:
import java.awt.*;
import shapes.*;
import shapes.Rectangle; // New Rectangle, and Rectangle.xxx will use shapes.Rectangle.
This has been done for years, since Java 1.2, with awt and util List classes. If you occasionally want to use java.awt.Rectangle, well, use the full class name, e.g., new java.awt.Rectangle(...);.
It's normal in Java world to have a lot of imports - you really need to import everything. But if you use an IDE, such as Eclipse, it does the imports for you.
It's a good practice to import class by class instead of importing whole packages
Any good IDE, such as Eclipse, will collapse the imports in one line, and you can expand them when needed, so they won't clutter your view
In case of conflicts, you can always refer to fully qualified classes, but if one of the two classes is under your control, you can consider renaming it (with Eclipse, right click on the class, choose menu Refactor → Rename, it will take care to update all its references).
If your class is importing from AWT and from your package of shapes, is ok. It's ok to import from several classes; however, if you find yourself importing from really lots of disparate sources, it could be a sign that your class is doing too much, and need to be split up.

Java package politics

I always doubt when creating packages, I want to take advantage of the package limited access but at the same time I want to have similar classes divided into packages.
The problem comes when you understand that packages are not hierarchical in Java:
At first, packages appear to be
hierarchical, but they are not.
source
Imagine I have an API defined with its classes at foo.bar, only the classes the API client needs are set public. Then I have another package with some internal objects I need in the API defined at foo.bar.pojos, this classes need to be public so they can be accessed by foo.bar but this means the API client could also access them if the package foo.bar.pojos is imported.
What is the common package politic that should be followed?
I've seen two ways of doing.
The first one consists in separating the public API and internal classes into two different artefacts (jars). The documentation is separated as well, and it's thus easy for the end user to make the distinction between what is internal and what is not. But it sometimes make things more complex to have two jars, two source trees, etc.
The second one consists in delivering a single jar, but have a good documentation allowing to know what's internal and what's not. The textual documentation can explain how to use the API (and thus avoids talking about the internals). And the javadoc can specify that a class is for internal use and is thus subject to changes.
Yes, Java packages don't give you enough control over your dependencies. The classic way to deal with this is to put external APIs in one package and internal implementation classes in another, and rely on people's good sense to avoid creating dependencies on the latter.
With Maven and OSGI, you have an additional mechanism for managing dependencies between modules / bundles of packages. In the case of OSGI, you can explicitly declare some packages as not exported, and an OSGI aware development environment will prevent people creating harmful dependencies. Maven's module support is weaker, but at least it controls dependency cycles.
Finally, you could use custom PMD rules to enforce your project's modularization conventions ... in the same way that there are rules to discourage dependencies on Java's "com.sun.*" package tree.
It is a mess.
Using only what Java itself offers, you have to put everything in the same package. You end up with a single (or a few) packages with lots of classes, and no good way to group them for yourself (but at least that problem does not leak outside). Most people don't do that, though, and as a result, your (as a developer on top of these libraries) public classpath is littered with stuff you should never need to see.
You might like OSGi, which has (and enforces) the concept of bundle-private packages. Those are not exported to the outside world.

What is a good practice to combine your classes

Not to keep all my classes in a single src -> 'package_name' folder I'm creating different sub-packages in order to separate my classes by groups like - utilities, models, activities themselves, etc. I'm not sure if it is a good practice and people do the same in real projects.
Yes, it's definitely standard practice to separate your classes into packages. It's good to establish a convention for how they are separated, to make it easier to find things later. Two common approaches:
Put things into packages based on what they are: model, service, data access (DAO), etc.
Put things into packages based on what function they support (for example, java.io, java.security, etc.
I've used both and keep coming back to the former because it's less subjective (it's always clear whether a class is a model or a service, but not always clear whether it supports one function or another function).
Doing it by class type the way you describe is one way that I've seen in real projects. I don't care for it as much as I used to because when I need to make a change or add a feature I tend to need to have several packages expanded in my IDE. I prefer (when I have the choice) to group classes by feature instead. That way I know where to look for all classes that support that feature.
The convention I prefer is to group classes first by module, then by functionality. For example, you could have the following structure:
com.example.modulea - modulea specific code that doesn't have any real need of a different package
com.example.modulea.dao - data access for module a
com.example.modulea.print - printing for module a
...
com.example.moduleb - moduleb specific code that doesn't have any real need of a different package
com.example.moduleb.dao - data access for module b
com.example.moduleb.print - printing for module b
In this fashion, code is clearer by package.
In the other style, of grouping by pure functionality, the package size tends to be quite large. If your project contains 15 modules, and each module has one or more elements per package, that's at least 15 classes per package. I much prefer clearly separated packages than packages that simply group things because "oh here are some printing utilities that are used for every module but only one module actually uses one of them from this package" - it just gets confusing.

Categories

Resources