How Maven loads classes using ClassLoader - java

My knowledge of ClassLoader in Java at the moment, is a little obscure. That's because I have not found any good documentation geared towards beginner of CLassLoader, thus far. And what I am looking for exactly is in relation to Maven. With disclaimer stated, let me get into my question.
I am writing a Spring MVC application, and I decided to look into how the Dependencies - Jars and classes are loaded using ClassLoader. And what I found is according to the documentation of ClassLoader, classes in Jar are loaded from CLASSPATH, and I can see them under the .m2/repository directory, but CLASSPATH does not yield anything, it's practically empty.
Can somebody please explain to me, how the classes from JAR are loaded into JVM Memory using ClassLoader using Maven, if the CLASSPATH is empty.
Thanks

You're confusing a few things.
The ClassLoader
The ClassLoader is a runtime concept. Maven is a compile time concept. Therefore, one has nothing whatsoever to do with the other. Maven and ClassLoaders do not interact. At all.
When you start a basic java app (java -jar foo.jar or java com.foo.MainClass), you get 2 classloaders. One loader will load system stuff: java.lang.String, for example. the executable itself 'just knows' how to do this (you don't need to configure PATH, CLASSPATH or JAVA_HOME - it just works); up to java 8, it finds rt.jar automatically, which contains String.class and other core classes. Starting from java 11, it finds the jmod files in your java distro.
Then, once the VM has 'booted up', the VM makes another classloader, also based on its built-in stuff: The app classloader.
This one uses 'the classpath'. The source of this depends on how you ran your java app:
java -jar somejar.jar
The source of the classpath in this case is the Class-Path: line in the jar's manifest (the file at META-INF/MANIFEST.MF). And nothing else - the CLASSPATH environment variable, and any -cp or -classpath options are entirely ignored.
java -cp a.jar:b.jar:. com.foo.ClassName
Note that -cp is short for -classpath (they mean the same thing): Here, the classpath is taken to be all the files and directories listed (on windows, use ; as separator instead), and classes are loaded from there. Including com.foo.ClassName itself.
java com.foo.ClassName
If you don't explicitly specify a -cp param, then the environment var CLASSPATH is used. You don't want this though. Always specify classpath.
That's runtime - maven has nothing whatsoever to do with this.
Make your own
You can make your own ClassLoader; the abstraction is such that all it needs to be able to do, is turn a resource name into bytes. That's it. You can make a ClassLoader (literally! public class MyLoader extends java.lang.ClassLoader { ... }) that.. loads data from a network, or generates it on the fly, or fetches it from an encrypted data store. Whatever you like.
Using custom classloaders like this is a solution for finding classes in 'weird' places (not jar files or directories), as well as a mechanism to allow java to 'reload classes' on the fly - very useful when developing, say, web apps, without the use of a hot-code-replacing debugger like eclipse has. ClassLoaders is a mechanism whereby a web server can have the following feature: "I load jars or wars from a certain preconfigured directory.. and if you replace the jar, I will see it and start using the new one".
Writing your own ClassLoader is bordering on rocket science and not usually required unless you're, say, writing an app server. Not a common job.
Maven
To compile source code, the compiler must know the methods and fields and such of all the types you refer. You can't compile "Hello".toLowerCase(); if the compiler doesn't know what String contains.
Thus, compilers also have this notion of 'I need to find classes'. But This is not called class loading, and notably, maven never loads any classes. If it did, any static initializers in any class would run, and mess up your compile. Maven instead just inspects the class file, never letting the VM actually load it, to know what kind of methods and fields and the like are on offer.
java.lang.ClassLoader plays no part in this.
javac itself has a -classpath option as well. So does maven, really.
Maven constructs the classpath automatically, by putting the stuff it already compiled for you (e.g. when compiling the stuff in src/test/java, the compiled stuff from src/main/java is on the classpath), as well as all the dependencies. How? Well, does it matter? Maven does. It constructs a large list of dirs and jars and passes it to javac via the -classpath parameter.

Related

jar built with jwrapper doesn't work

jwrapper manipulates application jars somehow, and is resulting in a non-functioning jar: at runtime it throws a "MyClass cannot be cast to MyClass" type error. I believe this is caused by re-evaluating code that creates a class loader, leading to multiple instances of class MyClass being loaded.
The jwrapper docs don't describe the changes made to the jar, except for the use of pack200. I've tested pack200 in isolation, and it does not cause this problem.
I've also tested the jar built by jwrapper without using the wrapper executable, by passing it to "java -jar". So it's not jvm transmuting, or anything else that the wrapper is doing: the jar itself is broken.
UPDATE:
jwrapper allows skipping pack200, but then the install file is huge. Since pack200 works when run standalone, I could work around this if there were some way to tell jwrapper that the file is already packed. Using <Pack200Exceptions> doesn't help, because then it doesn't know the file is packed.
The underlying problem is that jwrapper sets the pack200 option "modification_time" to "latest", which changes the modification times of all the class files. At run-time this causes the clojure compiler to attempt to recompile the classes from source.
A work-around is to remove the .clj files from the jar prior to packaging, preventing the compiler from running. The lein ":omit-source" option is not sufficient here, because it leaves in .clj files from any dependencies. Instead you must use a pattern in :uberjar-exclusions, e.g.
:uberjar-exclusions [#".(clj|java)"]
as detailed here:
https://github.com/technomancy/leiningen/issues/1357

ClassNotFoundException after replacing a jar in classpath

I have a java process running with two jar files in the classpath namely
- A.jar
- B.jar
While the process was running, I replaced the B.jar with another B.jar which I updated with some files. Now in my process, I see some ClassNotFoundException s for the classes in the B.jar. I don't understand what is happening here. I thought the jars would be loaded when the java process was started. If that is the case why is it happening? Can somebody help me with this?? I know if I restart the process, everything will be fine. But I am curious to know the reason behind this.
Classes in a JAR file are loaded when they're first used, not at JVM startup. By replacing B.jar while the application is running, if you've removed classes that are referred to by others, you will get a ClassNotFoundException.
This can also happen in Java 7 if a class that you haven't used for a while has been garbage collected. The JVM will attempt to re-load it, and find that it is no longer in the classpath. This can also happen in earlier versions of Java if you're using the -XX:+CMSClassUnloadingEnabled startup option.
JVM supports static and dynamic loading of classes. JVM will load at startup all classes that are linked explicitely, but won't "discover" classes that are loaded dynamically at Runtime, via reflection for example. If you're doing a Class.forName("org.package.mySuperClass") in your code, and if your SuperClass is never linked by other pieces of code, it will be loaded at call time. If your jar containing this class has been removed from the classpath before the call, a ClassNotFoundException will be thrown
Note that a lot of modern framework use dynamic loading (even dynamic compilation that links to classes in classpath that were not linked before), and it's diffcult (and uncertain) to know which ones.

Compiling Java & JARs

I just asked a recent question about distributing executable JARs and their dependencies, and it made me realize that my understanding of JARs may be fundamentally flawed.
Thus, some might say "Hey now! This here is a duplicate question!" But I say nay, this question is a completely separate offshoot of this
original question, and is concerned with Java fundamentals!
If I have an application that depends on, say, the Apache Commons CLI as well as JODA Time, and I pack this app up into a distributable JAR, my original question was: Without including the CLI and JODA JARs in my JAR, how does the program run on the client-side???
I am now thinking that since my code, which uses CLI and JODA, gets compiled into classfiles, and that bytecode is what gets packaged, then there is no need to include CLI or JODA (or any other 3rd party JAR) in my JAR, since it is all now functioning bytecode.
Can someone confirm or correct me? This revelation, though late in coming, has been staggering.
No, that is not quite right. The key to everything is the classpath. Is all of the compiled code and/or other resources on the classpath? If you package everything up in one single jar, then yes, it is in the classpath and the JVM will locate all the resources to run. Otherwise, you need to specify (with a .bat or .sh file or something) all the resources that your application is dependent on, so the JVM will be able to appropriately look for those resources (be they Java code or properties files or whatever).
Also if I am reading your question right, are you assuming that the CLI and JODA code gets compiled into your code? If so, I hate to burst your bubble, but that is not the case. When your code compiles, it does not bring in dependencies (not in the sense you may be thinking). What it does at a conceptual level (correct me if I'm wrong JVM gurus) is it references other classes. Those references are what you are building when you code a class and compile it. At runtime the JVM will attempt to locate the compiled class behind the reference and THAT is where you either need the jar with those classes in the classpath OR you need those classes in your executable jar.
Make sense?
The third party libraries (JodaTime, for example) need to be on the classpath during runtime. Not "packaged within your JAR".
If your app is launched from a JAR. You should specify the classpath in the manifest file which is packaged within the jar - http://download.oracle.com/javase/tutorial/deployment/jar/downman.html
You can have ANT generate the manifest classpath for you using the manifestclasspath element - http://ant.apache.org/manual/Tasks/manifestclasspath.html

Are Tomcat jars meant to be added to the classpath?

Since jars like servlet.jar are usually not downloaded on their own, but rather come part of tomcat/lib folder, should I just add an entry to them in the classpath? Is that the common practice?
I use Ubuntu.
You only need to reference them yourself when you want to compile servlet classes. How to do that depends in turn on the tools used for compilation.
If you're using plain javac, then you could reference them in %CLASSPATH%. But even then, that's considered a poor practice since that would potentially pollute the default classpath of all other Java compilations/applications. Rather write a shell file which sets the classpath right on the current execution environment by utilizing the -cp attribute of javac command.
If you're using a bit decent IDE like Eclipse/Netbeans, then you should just integrate the server in the IDE and associate the project with it. The IDE will then take care about setting the buildpath right. You don't need to set any environment variables then.
You do not need to reference them when you want to run them. The servletcontainer will take care about it by itself.
See also:
How do I import Servlet API in Eclipse?
If you are running a web application on Tomcat then the servlet-api.jar is in the classpath.

Eclipse Java project folder organization

I am coming to Java and Eclipse from a C#/Visual Studio background. In the latter, I would normally organize a solution like so:
\MyProjects\MyApp\MyAppsUtilities\LowerLevelStuff
where MyApp would contain a project to build a .exe, MyAppsUtilities would make an assembly DLL called by the .exe, and LowerLevelStuff would probably build an assembly containing classes used by the higher-level utilities DLL.
In Eclipse (Ganymede, but could be convinced to switch to Galileo) I have:
\MyProjects\workspace\MyApp
When I create my initial project. There is an option to put source and build files in same folder, but I have .java files created on a path that is reflective of my package hierarchy:
\MyProjects\workspace\MyApp\src\com\mycompany\myapp\MyApp.java
My question is this: when I create subprojects (is that the right Java/Eclipse term?) for .jar files that will be analogous to the above MyAppsUtilities and LowerLevelStuff assembly DLLs in .NET, can (should) I organize the folders equivalently? E.g.:
\MyProjects\workspace\MyApp\src\com\mycompany\myapp\myapputilities\MyAppsUtilities.java
What is the standard/right way to organize this stuff, and how is it specifcally done in the IDE?
Think of Java source code packages as one big hierarchical namespace. Commercial applications typically live under 'com.mycompany.myapp' (the website for this application might be 'http://myapp.mycompany.com' although this is obviously not always the case).
How you organize stuff under your myapp package is largely up to you. The distinction you make for C# between executable (.exe), DLL's and low-level classes does not exist in the same form in Java. All Java source code is compiled into .class files (the contents of which is called 'bytecode') which can be executed by a Java Virtual Machine (JVM) on many platforms. So there is no inherent distinction in high-level/low-level classes, unless you attribute such levels via your packaging. A common way of packaging is:
com.mycompany.myapp: main class; MyApp (with a main method)
com.mycompany.myapp.model: domain model classes; Customer, Order, etc.
com.mycompany.myapp.ui: user interface (presentation or view) code
com.mycompany.myapp.service: services within your application, i.e. 'business logic'
com.mycompany.myapp.util: helper classes used in several places
this suggests a standalone Java app, it might be different if it is a webapp using one of the many frameworks.
These packages correspond to a directory hierarchy in your project. When using Eclipse, the root of such a hierarchy is called a 'source directory'. A project can define multiple source directories, commonly a 'main' and a 'test' source directory.
Example of files in your project:
src/test/java/com/acme/foo/BarTest.java
src/main/java/com/acme/foo/Bar.java
lib/utilities_1_0.jar
And inside utilities_1_0.jar:
com/acme/foo/BarUtils.class
BarUtils.class this is a compiled java class, so in platform independent bytecode form that can be run on any JVM. Usually jarfiles only contain the compiled classes although you can sometimes download a version of the jar that also contains the source (.java) files. This is useful if you want to be able to read the original source code of a jar file you are using.
In the example above Bar, BarTest and BarUtils are all in the same package com.acme.foo but physically reside in different locations on your harddisk.
Classes that reside directly in a source directory are in the 'default package', it is usually not a good idea to keep classes there because it is not clear to which company and application the class belongs and you can get name conflicts if any jar file you add to your classpath contains a class with the same name in the default package.
Now if you deploy this application, it would normally be compiled into .class files and bundled in a .jar (which is basically a fancy name for a .zip file plus some manifest info).
Making a .jar is not necessary to run the application, but handy when deploying/distributing your application. Using the manifest info you can make a .jar file 'executable', so that a user can easily run it, see [a].
Usually you will also be using several libraries, i.e. existing .jar files you obtained from the Internet. Very common examples are log4j (a logging framework) or JDBC libraries for accessing a database etc. Also you might have your own sub-modules that are deployed in separate jarfiles (like 'utilities_1_0.jar' above). How things are split over jarfiles is a deployment/distribution matter, they still all share the universal namespace for Java source code. So in effect, you could unzip all the jarfiles and put the contents in one big directory structure if you wanted to (but you generally don't).
When running a Java application which uses/consists of multiple libraries, you run into what is commonly referred to as 'Classpath hell'. One of the biggest drawbacks of Java as we know it. (note: help is supposedly on the way). To run a Java application on the command line (i.e. not from Eclipse) you have to specify every single .jar file location on the classpath. When you are using one of Java's many frameworks (Maven, Spring, OSGi, Gradle) there is usually some form of support to alleviate this pain. If you are building a web application you would generally just have to adhere to its layering/deployment conventions to be able to easily deploy the thing in the web container of your choice (Tomcat, Jetty, Glassfish).
I hope this gives some general insight in how things work in Java!
[a] To make an executable jar of the MyApp application you need a JDK on your path. Then use the following command line in your compile (bin or target) directory:
jar cvfe myapp.jar com.mycompany.myapp.MyApp com\mycompany\myapp
You can then execute it from the command line with:
java -jar myapp.jar
or by double-clicking the jar file. Note you won't see the Java console in that case so this is only useful for applications that have their own GUI (like a Swing app) or that may run in the background (like a socket server).
Maven has a well thought out standard directory layout. Even if you are not using it Maven directly, you can think of this as a defacto standard. Maven "multi module" projects are a fair analogy to the .net multiple assembly layout that you described.
Typically you would create related/sub projects as different Projects in Eclipse.
There are two things you need to clarify before this question can be answered:
Which source code repository will you use?
Which build system will you use to automatically build artifacts outside of Eclipse?
The answers will strongly influence your options.
We have opted for "one Eclipse project pr component" which may be either a library or a finished runnable/executable jar. This has made it easy to automate with Hudson. Our usage of CVS is also easier, since single projects do not have multiple responsibilities.
Note, each project may contain several source folders separating e.g. test code from configuration from Java source. That is not as important as simplifying your structure.

Categories

Resources