Java's package management system always seemed simple and effective to me. It is heavily used by the JDK itself. We have been using it to mimic the concept of namespaces and modules.
What is Project Jigsaw (aka Java Platform Module System) trying to fill in?
From the official site:
The goal of this Project is to design and implement a standard module
system for the Java SE Platform, and to apply that system to the
Platform itself and to the JDK.
Jigsaw and OSGi are trying to solve the same problem: how to allow coarser-grained modules to interact while shielding their internals.
In Jigsaw's case, the coarser-grained modules include Java classes, packages, and their dependencies.
Here's an example: Spring and Hibernate. Both have a dependency on a 3rd party JAR CGLIB, but they use different, incompatible versions of that JAR. What can you do if you rely on the standard JDK? Including the version that Spring wants breaks Hibernate and visa versa.
But, if you have a higher-level model like Jigsaw you can easily manage different versions of a JAR in different modules. Think of them as higher-level packages.
If you build Spring from the GitHub source you'll see it, too. They've redone the framework so it consists of several modules: core, persistence, etc. You can pick and choose the minimal set of module dependencies that your application needs and ignore the rest. It used to be a single Spring JAR, with all the .class files in it.
Update: Five years later - Jigsaw might still have some issues to resolve.
AFAIK The plan is to make the JRE more modular. I.e. have smaller jars which are optional and/or you can download/upgrade only the functionality you need.
Its to make it less bloated and give you the option of dropping legacy modules which perhaps most people don't use.
Based on Mark Reinhold's keynote speech at Devoxx Belgium, Project Jigsaw is going to address two main pain points:
Classpath
Massive Monolithic JDK
What's wrong with Classpath?
We all know about the JAR Hell. This term describes all the various ways in which the classloading process can end up not working. The most known limitations of classpath are:
It's hard to tell if there are conflicts. build tools like maven can do a pretty good job based on artifact names but if the artifacts themselves have the different names but same contents, there could be a conflict.
The fundamental problem with jar files is that they are not components. They're just bunch of file containers that will be searched linearly. Classpath is a way to lookup classes regardless of what components they're in, what packages they're in or their intended use.
Massive Monolithic JDK
The big monolithic nature of JDK causes several problems:
It doesn't fit on small devices. Even though small IoT type devices have processors capable of running an SE class VM but they do not have necessarily the memory to hold all of the JDK, especially, when the application only uses small part of it.
It's even a problem in the Cloud. Cloud is all about optimizing the use of hardware, if you got thousands of images containing the whole JDK but applications only use small part of it, it would be a waste.
Modules: The Common Solution
To address the above problems, we treat modules as a fundamental new kind of Java program component. A module is a named, self-describing collection of code and data. Its code is organized as a set of packages containing types, i.e., Java classes and interfaces; its data includes resources and other kinds of static information.
To control how its code refers to types in other modules, a module declares which other modules it requires in order to be compiled and run. To control how code in other modules refers to types in its packages, a module declares which of those packages it exports.
The module system locates required modules and, unlike the class-path mechanism, ensures that code in a module can only refer to types in the modules upon which it depends. The access-control mechanisms of the Java language and the Java virtual machine prevent code from accessing types in packages that are not exported by their defining modules.
Apart from being more reliable, modularity could improve performance. When code in a module refers to a type in a package then that package is guaranteed to be defined either in that module or in precisely one of the modules read by that module. When looking for the definition of a specific type there is, therefore, no need to search for it in multiple modules or, worse, along the entire class path.
JEPs to Follow
Jigsaw is an enormous project that is ongoing for a quite a few years. It's got an impressive amount of JEPs which are great places to gain more information about the project. Some of these JEPs are as the following:
JEP 200: The Modular JDK: Use the Java Platform Module System (JPMS) to modularize the JDK
JEP 201: Modular Source Code: Reorganize the JDK source code into modules, enhance the build system to compile modules, and enforce module boundaries at build time
JEP 261: Module System: Implement the Java Platform Module System, as specified by JSR 376, together with related JDK-specific changes and enhancements
JEP 220: Modular Run-Time Images: Restructure the JDK and JRE run-time images to accommodate modules and to improve performance, security, and maintainability
JEP 260: Encapsulate Most Internal APIs: Make most of the JDK's internal APIs inaccessible by default but leave a few critical, widely-used internal APIs accessible, until supported replacements exist for all or most of their functionality
JEP 282: jlink: The Java Linker: Create a tool that can assemble and optimize a set of modules and their dependencies into a custom run-time image as defined in JEP 220
Closing Remarks
In the initial edition of The State of the Module System report, Mark Reinhold describes the specific goals of the module system as following:
Reliable configuration, to replace the brittle, error-prone class-path mechanism with a means for program components to declare explicit dependences upon one another, along with
Strong encapsulation, to allow a component to declare which of its public types are accessible to other components, and which are not.
These features will benefit application developers, library developers, and implementors of the Java SE Platform itself directly and, also, indirectly, since they will enable a scalable platform, greater platform integrity, and improved performance.
For the sake of argument, let's assert that Java 8 (and earlier) already has a "form" of modules (jars) and module system (the classpath). But there are well-known problems with these.
By examining the problems, we can illustrate the motivation for Jigsaw. (The following assumes we are not using OSGi, JBoss Modules, etc, which certainly offer solutions.)
Problem 1: public is too public
Consider the following classes (assume both are public):
com.acme.foo.db.api.UserDao
com.acme.foo.db.impl.UserDaoImpl
At Foo.com, we might decide that our team should use UserDao and not use UserDaoImpl directly. However, there is no way to enforce that on the classpath.
In Jigsaw, a module contains a module-info.java file which allows us to explicitly state what is public to other modules. That is, public has nuance. For example:
// com.acme.foo.db.api.UserDao is accessible, but
// com.acme.foo.db.impl.UserDaoImpl is not
module com.acme.foo.db {
exports com.acme.foo.db.api;
}
Problem 2: reflection is unbridled
Given the classes in #1, someone could still do this in Java 8:
Class c = Class.forName("com.acme.foo.db.impl.UserDaoImpl");
Object obj = c.getConstructor().newInstance();
That is to say: reflection is powerful and essential, but if unchecked, it can be used to reach into the internals of a module in undesirable ways. Mark Reinhold has a rather alarming example. (The SO post is here.)
In Jigsaw, strong encapsulation offers the ability to deny access to a class, including reflection. (This may depend on command-line settings, pending the revised tech spec for JDK 9.) Note that because Jigsaw is used for the JDK itself, Oracle claims that this will allow the Java team to innovate the platform internals more quickly.
Problem 3: the classpath erases architectural relationships
A team typically has a mental model about the relationships between jars. For example, foo-app.jar may use foo-services.jar which uses foo-db.jar. We might assert that classes in foo-app.jar should not bypass "the service layer" and use foo-db.jar directly. However, there is no way to enforce that via the classpath. Mark Reinhold mentions this here.
By comparison, Jigsaw offers an explicit, reliable accessibility model for modules.
Problem 4: monolithic run-time
The Java runtime is in the monolithic rt.jar. On my machine, it is 60+ MB with 20k classes! In an age of micro-services, IoT devices, etc, it is undesirable to have Corba, Swing, XML, and other libraries on disk if they aren't being used.
Jigsaw breaks up the JDK itself into many modules; e.g. java.sql contains the familiar SQL classes. There are several benefits to this, but a new one is the jlink tool. Assuming an app is completely modularized, jlink generates a distributable run-time image that is trimmed to contain only the modules specified (and their dependencies). Looking ahead, Oracle envisions a future where the JDK modules are compiled ahead-of-time into native code. Though jlink is optional, and AOT compilation is experimental, they are major indications of where Oracle is headed.
Problem 5: versioning
It is well-known that the classpath does not allow us to use multiple versions of the same jar: e.g. bar-lib-1.1.jar and bar-lib-2.2.jar.
Jigsaw does not address this problem; Mark Reinhold states the rationale here. The gist is that Maven, Gradle, and other tools represent a large ecosystem for dependency management, and another solution will be more harmful than beneficial.
It should be noted that other solutions (e.g. OSGi) do indeed address this problem (and others, aside from #4).
Bottom Line
That's some key points for Jigsaw, motivated by specific problems.
Note that explaining the controversy between Jigsaw, OSGi, JBoss Modules, etc is a separate discussion that belongs on another Stack Exchange site. There are many more differences between the solutions than described here. What's more, there was sufficient consensus to approve the Public Review Reconsideration Ballot for JSR 376.
This article explains in detail the problems which both OSGi and JPMS/Jigsaw try to solve:
"Java 9, OSGi and the Future of Modularity" [22 SEP 2016]
It also goes thoroughly into the approaches of both OSGi and JPMS/Jigsaw.
As of now, it appears authors listed almost no practical Pros for JPMS/Jigsaw compared with matured (16 years old) OSGi.
Related
I have zero experience with Java, but when trying to understand a certain "apocalyptic" vulnerability, I ended up with a fundamental question about imports in Java, so please bear with me.
My question is, as given in the title, why a Java package can not be updated with a single central patch.
For comparison, two hypothetical diametric cases that I think I understand reasonably well:
If, say, a python library had some vulnerability, then it should suffice (on well-maintained systems that use centralized libraries located on PYTHONPATH) to update that single library and any code that imports it should, in general, be fixed.
On the other hand, if a C library had a vulnerability, then it would be necessary to replace every single binary whose source includes the vulnerable library with a patched binary.
Now, as far as I could tell, Java is actually closer to the former category of languages, where external imports are not included in compiled sources.
If this is the case, then why can't a single patch be applied to fix an entire system (au contraire, our IT department forwarded a gigantic list of software for us to check individually)? Is it because of multiple decentralized copies of identical libraries being installed, or is there some other reason? Or am I misunderstanding the issue?
Java applications themselves are separate processes. In principle, all these processes can use different VM's. This is often the case for larger applications, which are tested against a specific VM. In principle, Java runtimes (J2SE implementations) should remain as compatible as possible with each other, but it is certainly possible for developers or libraries to muck this up, e.g. by using "Sun" inner classes or by assuming things not specified for the API calls. Personally hate these kind of J2SE inclusions; I'd rather have applications that are created to remain compatible.
Smaller applications usually just run on one of the installed JRE's. However, they usually still need additional libraries or components - say, for instance, Log4J from Apache. These are often offered as separate .jar files (or "artifacts" in Maven speak). These libraries may also get updates; there is however not a common way of updating these on most systems; there is no single "application" set of shared libraries although it is certainly possible to create one. On Linux for instance there may be a set of libraries in /usr/share/java (by version, with generic names pointing to the latest one).
Many web applications - I those running on a specific application server such as Tomcat, Glassfish etc. do share a common "classpath", where application specific .jar files are put in specific folder. In that case an update of a library in the shared folder will affect all applications.
Java has had a framework for specific class-loaders, and in principle any framework can define their own set, so where the libraries are stored can depend on the framework. Java is very flexible and doesn't really have one single way of handling applications.
All this has previous little to do with import statements. These are just use as a shorthand notation, basically. You might as well use java.util.List as import java.util.List followed by List further in the code. Class files contain references to other classes (etc.), and those are resolved (found and loaded) at runtime; see the description from Oracle here.
We're currently migrating from Java 8 to Java 11. However, upgrading our services was less painful, than we anticipated. We basically only had to change the version number in our build.gradle file and the services were happily up and running. We upgraded libraries as well as (micro) services that use those libs. No problems until now.
Is there any need to actually switch to modules? This would generate needless costs IMHO. Any suggestion or further reading material is appreciated.
To clarify, are there any consequences if Java 9+ code is used without introducing modules? E.g. can it become incompatible with other code?
No.
There is no need to switch to modules.
There has never been a need to switch to modules.
Java 9 and later releases support traditional JAR files on the
traditional class path, via the concept of the unnamed module, and will
likely do so until the heat death of the universe.
Whether to start using modules is entirely up to you.
If you maintain a large legacy project that isn’t changing very much,
then it’s probably not worth the effort.
If you work on a large project that’s grown difficult to maintain over
the years then the clarity and discipline that modularization brings
could be beneficial, but it could also be a lot of work, so think
carefully before you begin.
If you’re starting a new project then I highly recommend starting with
modules if you can. Many popular libraries have, by now, been upgraded
to be modules, so there’s a good
chance that all of the dependencies that you need are already available
in modular form.
If you maintain a library then I strongly recommend that you
upgrade it to be a module if you haven’t done so already, and if all of
your library’s dependencies have been converted.
All this isn’t to say that you won’t encounter a few stumbling blocks
when moving past Java 8. Those that you do encounter will, however,
likely have nothing to do with modules per se. The most common
migration problems that we’ve heard about since we released Java 9 in
2017 have to do with changes to the syntax of the version
string and to the removal or
encapsulation of internal APIs
(e.g., sun.misc.Base64Decoder) for which public, supported
replacements have been available for years.
I can only tell you my organization opinion on the matter. We are in the process of moving to modules, for every single project that we are working on. What we are building is basically micro-services + some client libraries. For micro-services the transition to modules is somehow a lower priority: the code there is already somehow isolated in the docker container, so "adding" modules in there does not seem (to us) very important. This work is being picked up slowly, but it's low priority.
On the other hand, client libraries is an entirely different story. I can not tell you the mess we have sometimes. I'll explain one point that I hated before jigsaw. You expose an interface to clients, for everyone to use. Automatically that interface is public - exposed to the world. Usually, what I do, is have then some package-private classes, that are not exposed to the clients, that use that interface. I don't want clients to use that, it is internal. Sounds good? Wrong.
The first problem is that when those package-private classes grow, and you want more classes, the only way to keep everything hidden is to create classes in the same package:
package abc:
-- /* non-public */ Usage.java
-- /* non-public */ HelperUsage.java
-- /* non-public */ FactoryUsage.java
....
When it grows (in our cases it does), those packages are way too big. Moving to a separate package you say? Sure, but then that HelperUsage and FactoryUsage will be public and we tried to avoid that from the beginning.
Problem number two: any user/caller of our clients can create the same package name and extend those hidden classes. It happened a few times to us already, fun times.
modules solves this problem in a beautiful way : public is not really public anymore; I can have friend access via exports to directive. This makes our code lifecycle and management much easier. And we get away from classpath hell. Of course maven/gradle handle that for us, mainly, but when there is a problem, the pain will be very real. There could be many other examples, too.
That said, transition is (still) not easy. First of all, everyone on the team needs to be aligned; second there are hurdles. The biggest two I still see is: how do you separate each module, based on what, specifically? I don't have a definite answer, yet. The second is split-packages, oh the beautiful "same class is exported by different modules". If this happens with your libraries, there are ways to mitigate; but if these are external libraries... not that easy.
If you depend on jarA and jarB (separate modules), but they both export abc.def.Util, you are in for a surprise. There are ways to solve this, though. Somehow painful, but solvable.
Overall, since we migrated to modules (and still do), our code has become much cleaner. And if your company is "code-first" company, this matters. On the other hand, I have been involved in companies were this was seen as "too expensive", "no real benefit" by senior architects.
I'd like to use Kotlin & Scala together in projects, and maybe some other languages, but I've seen no good way of doing it. The only way I thought of was compiling one language and decompiling it into Java to work with the other. Are there any alternatives?
For the sake of completeness and not putting words into someone else's mouth, I wanted to weigh in.
I agree with the last sentence of ziggystar's answer. The right thing to do is to take a component-based approach and not try to combine multiple languages in one component or project.
From a technical perspective, each of the JVM languages has their own compiler. Some, such as Scala's, can compile both Scala and Java files. However, this may or may not be true for other compilers. In order to avoid strange build processes, a good approach would be to use a single language for every built module.
Since you're sticking to JVM languages, every languages can be compiled into a JAR, so you can easily distribute your executable binary as a single JAR file, with all of the components wrapped up inside it. This is the Fat JAR approach (see this question on Stack Overflow, this post on Java Code Geeks).
From a human readability perspective, this should also make your software more easily understood. Not only have you decomposed it into logical building blocks (each component), but someone making modifications only needs to understand the language that the component they are working on is written in and the public interface of the components they need to interact with. There's no mental context switching between languages.
You can use Scala and Java simultaneously, since scalac understands and compiles Java files. The same probably holds for other languages. Problems might arise when using multiple alternative JVM languages, since, e.g., the Kotlin compiler probably can't understand the Scala files and vice versa.
I think the best way would be to split the project into different modules, and use at most one alternative language per module.
What do I mean with module?
With module, I mean a set of source files that gets translated into one (binary) artifact, i.e. a jar file. Under different circumstances I would simply call a "module" a project. Note that a module may depend on other modules on the binary level (e.g. has some jar files as dependencies).
Multi module support in IDEs
I think it should be possible with most major IDEs to work on different modules simultaneously, even if each module uses a different language. Terminology varies across IDEs.
Terminology
For Intellij IDEA, one of my modules is called "module". For Eclipse it would be called "project".
Hi all I was wondering how did the packages org.ietf, org.omg, org.w3c, and org.xml made it into the "official" Java classes ?
For example, it makes sense that the default JDK wouldn't have all the classes from Apache Commons,
By the same philosophy, shouldn't these org.w3c, org.omg packages be outside of the default JDK classes (i.e. not included within the JDK installation)?
These are all generally code representing standards, IETF, OMG, and W3C are all standards organizations. The code that you are referring to was created with these package names and was/is very widely used so it made sense to put it into the JDK with their original names. An exception to the standards name is the org.xml package. That has SAX which is an early Java/XML open source implementation of streaming XML event handling that became very popular. It's also code that is at the right level (a fairly low level) in the programming hierarchy so that it will be generally needed universally. Some of it is code that other parts of the Java runtime environment depend on.
Code in open source projects like Apache Commons is either not a standard or not required by other parts of the Java runtime, so there is no strong reason to include it.
Note in other cases, Sun/Oracle has added code external to the JDK to implement core features (Doug Lea's concurrency stuff comes to mind), but these packages were renamed into java packages.
I have a Java project and internally it is dependent on asm jar. Strangely, I don't even know why my project somehow is dependent on this library (might be brought in by maven as a transitive dependency)?
Can anyone help me know why some one needs asm jar?
Thanks in advance !
EDIT:
Can you also mention for what purposes/use-cases one might need asm jar?
ASM is a bytecode manipulation framework (see this page for a nice introduction) and is used by many things performing... bytecode manipulation: frameworks using proxy generation and reflection (Spring, Hibernate, etc), mocking frameworks (EasyMock, JMock, etc), code analysis tools (PMD, Findbugs, etc). Actually, the ASM project maintains a list of users organized by category, check it out.
As mentioned by Vincent, if you are depending transitively on ASM, the dependency:tree goal or the dependency report (see the PMD and Findbugs links above for examples) can help to analyze the situation and to find out from where its coming from. But this won't take into account dependencies of maven plugins that you are using, only dependencies of your project.
Maven-2 requires asm.jar to compile and run the application.
Here for more information.
EDIT:
Due to the many possible usages of program analysis, generation and transfor-
mation techniques, many tools to analyze, generate and transform programs
have been implemented, for many languages, Java included. ASM is one of
these tools for the Java language, designed for runtime – but also offline – class generation and transformation. The ASM1 library was therefore designed to
work on compiled Java classes. It was also designed to be as fast and as small
as possible. Being as fast as possible is important in order not to slow down
too much the applications that use ASM at runtime, for dynamic class gener-
ation or transformation. And being as small as possible is important in order
to be used in memory constrained environments, and to avoid bloating the
size of small applications or libraries using ASM.
ASM is not the only tool for generating and transforming compiled Java
classes, but it is one of the most recent and efficient. It can be downloaded
from http://asm.objectweb.org. Its main advantages are the following:
1) It has a simple, well designed and modular API that is easy to use.
2) It is well documented and has an associated Eclipse plugin.
3) It provides support for the latest Java version, Java 6.
4) It is small, fast, and very robust.
5) Its large user community can provide support for new users.
6) Its open source license allows you to use it in almost any way you want.
Found from this pdf file. I am under the impression that along with Java EE 6 also came a built-in tool, ASM for class generation and transformation. The PDF gives you detail in greater depth about ASM.
Hope this helps.
What other dependencies does your project have ? I suspect one of the jars you've decided you require (e.g. Spring or Hibernate) itself requires asm.jar ?
It is possible to use the dependency plugin for Maven to see which library has asm as a dependency.