Providing dependencies in Java application

Providing dependencies in Java application - java

I'm developing container-like application on Java which can host another Java-based applications (like Google App Engine). Currently I'm using Maven Assembly Plugin to assemble the hosted applications into one super-JAR with all the dependencies, but it looks like a bad practise.
Is there some libraries or tools to provide the dependencies at a runtime? I mean, I want to make a JAR only for hosted component-specific classes, without any dependencies, and store the dependencies list inside this JAR (in pom.xml or somewhere else). Host application will read this manifest and load required libraries to runtime. Is there any ready solutions?

You may have trouble with a tool like this; Java can load things dynamically at run-time too. This may work for you though: http://depfind.sourceforge.net ...
If you really need to get the size down, you may use the verbose flag in the java vm to see all the classes that are loaded. You will have to execute all the code to get the classes as they are loaded only when used (and loaded only once). Further, it is uncommon, but different classes may load depending on the platform.

Related

Library does not find its own "sublibrary"

I'm trying to make an addresslist in Java, which saves its contents in a Sqlite database.
Therefor (and for other future uses), I tried to create my own library for all kinds of database connections ("PentagonsDatabaseConnector-1.0.jar"). It currently supports Sqlite and MySql.
It references other libraries for them to provide the JDBC-drivers ("mysql-connector-java-8.0.16.jar" and "sqlite-jdbc-3.30.1.jar").
Problem: My Library works just fine if I'm accessing it from its own project folder, but as soon as I compile it and add it to the "Adressliste"-project, it isn't able to find the JDBC-drivers anymore (I can access the rest of my self-written library without problems though). Also, as shown in the screenshot, "PentagonsDatabaseConnector-1.0.jar" brings the JDBC-libraries with itself in "lib"-folder.
LINK TO THE SCREENSHOT
Do you guys have an idea whats wrong?
Thank you for your help!
Ps: Sorry for bad English, I'm German :)

Java cannot read jars-in-jars.
Dependencies come in a few flavours. In this case, PentagonsDC is a normal dependency; it must be there at runtime, and also be there at compile time.
The JDBC libraries are a bit special; they are runtime-only deps. You don't need them to be around at compile time. You want this, because JDBC libraries are, as a concept, pluggable.
Okay, so what do I do?
Use a build system to manage your dependencies is the answer 90%+ of java programmers go to, and what I recommend you do here. Particularly for someone starting out, I advise Maven. Here you'd just put in a text file the names of your dependencies and maven just takes care of it, at least at compile time.
For the runtime aspect, you have a few options. It depends on how your java app runs.
Some examples:
Manifest-based classpaths
You run your java application 'stand alone', as in, you wrote the psv main(String[]) method that starts the app and you distribute it everywhere it needs to run. In this case, the usual strategy is to have an installer (you need a JVM on the client to run your application and neither oracle nor any OS vendor supports maintaining a functioning JVM on end-user's systems anymore; it is now your job – this is unfortunately non-trivial), and given that you have that, you should deploy your jars such that they contain in the manifest (jars are zips, the manifest ends up at META-INF/MANIFEST.MF):
Main-Class: com.of.yourproj.Main
Class-Path: lib/sqlite-jdbc.jar lib/mysql-jdbc.jar lib/guava.jar
And then have a directory stucture like so:
C:\Program Files\yourapp\yourapp.jar
C:\Program Files\yourapp\lib\sqlite-jdbc.jar
C:\Program Files\yourapp\lib\mysql-jdbc.jar
Or the equivalent on any other OS. The classpath entries in the manifest are space separated and resolved relative to the dir that 'yourapp.jar' is in. Done this way, you can run yourapp.jar from anywhere and it along with all entries listed in Class-Path are now available to it.
Build tools can make this manifest for you.
Shading / Uberjars
Shading is the notion of packing everything into a single giant jar; not jars-in-jars, but unpack the contents of your dependency jars into the main app jar. This can be quite slow in the build (if you have a few hundred MB worth of deps, those need to be packed in and all class files need analysis for the shade rewrite, that's a lot of bits to process, so it always takes some time). The general idea behind shading is that deployment 'is as simple as transferring one jar file', but this is not actually practical, given that you can no longer assume that end users have a JVM installed, and even if they do, you cannot rely on it being properly up to date. I mention it here because you may hear this from others, but I wouldn't recommend it.
If you really do want to go for this, the only option is build systems: They have a plugin to do it; there is no command line tool that ships with java itself that can do this. There are also caveats about so-called 'signed jars' which cannot just be unpacked into a single uberjar.
App container
Not all java apps are standalone where you provide the main. If you're writing a web service, for example, you have no main at all; the framework does. Instead of a single entrypoint ('main' - the place where your code initially begins execution), web services have tons of entrypoints: One for every URL you want to respond to. The framework takes care of invoking them, and usually these frameworks have their own documentation and specs for how dependencies are loaded. Usually it is a matter of putting a jar in one place and its dependencies in a subdir named 'lib', or you build a so-called war file, but, really, so many web frameworks and so many options on how they do this. The good news is, usually its simple and the tutorial of said framework will cover it.
This advice applies to any 'app container' system; those are usually web frameworks, but there are non-web related frameworks that take care of launching your app.
Don't do these
Don't force your users to manually supply the -classpath option or mess with the CLASSPATH environment variable.
Don't try to write a custom classloader that loads jars-in-jars.
NB: Sqlite2 is rather complicated for java; it's not getting you many of the benefits that the 'lite' is supposed to bring you, as it is a native dependency. The simple, works everywhere solution in the java sphere is 'h2', which is written in all java, thus shipping the entire h2 engine as part of the java app is possible with zero native components.

Find out which Java classes are actually loaded and reduce jar

Is there a way to automatically find out which Java classes are actually loaded (either during compile time, as far as that's possible, or during the runtime of an application), and to throw out all other classes from a JAR to create a smaller JAR? Does that actually make sense in practice?
I am talking about the application classes for an application JAR. Usually there are lots of libraries in an application, and an application rarely needs all features of those libraries. So I suspect that would make a considerably smaller application. In theory that might be done for example via an Java agent that logs which classes and resources are read by one or several runs of an application (or even just by java -verbose:class), and a maven plugin that throws out all other classes from a jar-with-dependencies. Is there already something like that?
Clarification: I am not talking about unused dependencies (JARs that are not used at all), but about removing unused parts of each included JAR.

Well, the Maven Shade Plugin has an option minimizeJar when creating an Uber-JAR for your application:
https://maven.apache.org/plugins/maven-shade-plugin/
But, as others already pointed out, this is quite dangerous, as it regularly fails to detect class accesses which are done via Reflection or other dynamic references.

It may not be a good approach automate, as application can use reflection to initialise objects or one JAR is dependent on another JAR.
Only way that I can think of is to remove each JARs one by one and check if application runs as expected. Then again in this approach all modules of the application has to be tested, since one module can work without particular dependency and other may not.
Better solution is to take care while developing. The application developer must be careful in adding a dependency and removing unwanted dependency after his/her piece of code is done.

Global strategy.
1) Find all the classes that are loaded during runtime.
2) List of all the classes available in the classpath.
3) Reduce your class path by creating copies of jars containing only classes you need.
I have done 1 and 2 part so I can help you.
1) Find out all the classes that are loaded. You need 100 % code coverage (I am not talking about tests, but production). So run all possible scenarios, so all the classes your app needs will be loaded and logged.
To log loaded classes try several approaches. Reflection, –verbose:class flag, also you can learn about java agent. It allows to modify methods during runtime. This is an example of some java agent code or another java agent example
2) To find all the classes available in jar, you can write a program. You need to know all places where application jars are placed. Loop throw these jars (You can use ZipFile), loop through ZipFileEntry entries, and collect all classes.
3) After that write a script or program that reassembles your application. For example, now you can create a new jar file for each library and put there only needed classes.
Also you may use a tool (again, you are a programmer, so write a program), which checks code for classes dependence. You do not want to remove classes if they are used for compilation. When I was a student, I wrote code alanyzer, which builds an oriented graph for classes dependencies.
As #Gokul Nath KP notes, I did this before. I manually change gradle and maven dependencies, removing one by one, and then full regression test. It took me a week (our application was small comparing to modern world enterprise systems created by hundreds of developers).
So, be creative, and in case of success, your project will be used by millions!

how to deal with shared-libraries for many web applications using the same libraries

We have a web application made in Java, which uses struts2, spring and JasperReport. This application runs on glassfish 4.0.
The libraries of the application are in the WEB-INF/lib folder, and also in glassfish are installed 4 more than uses the same libraries.
Glassfish is configured to use 1024mb for heapspace and 512m for permgen, and the most of the memory consumption when i use libraries per application is in the struts actions and spring aop classes (using netbeans profiler).
The problem we are having is the amount of memory consumed by having libraries in the classloader per application because is to high and generates PermGen errors and we have also noticed that the application run slower with more users.
because of that we try to use shared-libraries, put it in domain1/lib folder and found that with a single deployed application the load time and memory consumption is much lower, and the application works faster in general. But when we deploy the rest of the applications on the server only the first application loaded works well and the rest has errors when we calls struts2 actions.
We believe that is because each application has slightly different settings on struts2 and log4j.
We have also tried to put only certain libraries on glassfish and leaving only struts2 in the application but it shows InvocationTargetException errors because all libraries depend the lib from apache-common and it dont matter if we put those lib on one place or another. Also if we put it in both places the application don’t start.
there any special settings or best practices for using shared-libraries?
Is there a way to use shared-libraries but load settings per application? or we have to change the settings to make them all the same?

Is there any special settings or best practices for using shared-libraries? Is there a way to use shared-libraries but load settings per application? or we have to change the settings to make them all the same?
These are actually interesting questions... I don't use GlassFish but, according to the documentation :
Application-Specific Class Loading
[...]
You can specify module- or application-specific library classes [...] Use the asadmin deploy command with the --libraries option and specify comma-separated paths
[...]
Circumventing Class Loader Isolation
Since each application or individually deployed module class loader universe is isolated, an application or module cannot load classes from another application or module. This prevents two similarly named classes in different applications or modules from interfering with each other.
To circumvent this limitation for libraries, utility classes, or individually deployed modules accessed by more than one application, you can include the relevant path to the required classes in one of these ways:
Using the Common Class Loader
Sharing Libraries Across a Cluster
Packaging the Client JAR for One Application in Another Application
Using the Common Class Loader
To use the Common class loader, copy the JAR files into the domain-dir/lib or as-install/lib directory or copy the .class files (and other needed files, such as .properties files) into the domain-dir/lib/classes directory, then restart the server.
Using the Common class loader makes an application or module accessible to all applications or modules deployed on servers that share the same configuration. However, this accessibility does not extend to application clients. For more information, see Using Libraries with Application Clients. [...]
Then I would try:
Solution 1
put all the libraries except Struts2 jars under domain1/lib ,
put only Struts2 jars under domain1/lib/applibs,
then run
$ asadmin deploy --libraries struts2-core-2.3.15.2.jar FooApp1.war
$ asadmin deploy --libraries struts2-core-2.3.15.2.jar FooApp2.war
To isolate Struts2 libraries classloading while keeping the rest under Common Classloader's control.
Solution 2
put all the libraries except Struts2 jars under domain1/lib ,
put only Struts2 jars under domain1/lib/applibs, in different copies with different names, eg appending the _appname at the jar names
then run
$ asadmin deploy --libraries struts2-core-2.3.15.2_FooApp1.jar FooApp1.war
$ asadmin deploy --libraries struts2-core-2.3.15.2_FooApp2.jar FooApp2.war
To prevent sharing of the libraries by istantiating (mock) different versions of them.
Hope that helps, let me know if some of the above works.

You can try to create what is known as a skinny WAR. Pack all your WARs inside an EAR and move all the common JARs from WEB-INF/lib to the lib/ folder in the EAR (don't forget to set <library-directory> in the application.xml).

I'd bet that placing the libs under lib/ or lib/ext won't resolve your performance issues. You did not write anything about the applications or server settings, like size of application, available Heap and PermGen space, but nonetheless I would recommend to stay with separate libs per app.
If you place the libs in server dirs, they will be shared among all apps. You will loose the option to upgrade only one of your applications to a new framework or to get rid away of any of them. Your deployment will be bound to a specific server architecture.
And you wrote it did not solve your problems, it even may raise new ones.
I would recommend to invest some hours into tuning the server. If it runs with defaults, allocate more PermGen and HeapSpace.
If this does not help, you should analyze in deep what's going wrong. Shared libs might be a solution, but you don't know the problem, yet. IBM offer some cool and free tools to analyze heap dumps, this could be a good starting point.

I came here in search of guidance about installing libraries that are shared among multiple applications or projects. I am deeply disappointed to read that the accepted practice favors installing a copy of every shared library into each project. So, if you have ten Web application, all of which use, e. g., httpcomponents-client, mysql-connector-java, etc., then your installation contains ten copies of each.
This behavior reminds me, painfully, of the way of thinking that motivated me to abandon the mainframe in favor of the PC; the thinking seemed to be "I don't care how many resources my application consumes. In fact, I'd like to be able to brag about what a resource hog it is." Excuse me, please, while I hurl.
The interface exposed by a library is an immutable contract that is not subject to change at the developer's whim.
There is this concept called backwards compatibility. If you break it, you create a new interface.
I know of at least two types of interfaces that adhere to the letter and spirit of these rules.
By far the oldest is the IBM System/370 system libraries. You might have Foo and Foo2, where the latter extends and/or breaks the contract made by the Foo interface in some way that made it incompatible.
From its beginnings in the Bell Labs Unix project, the standard C runtime library has adhered to the above rules.
Though it is much newer, the Microsoft COM interface specification enforces the same rule.
To their credit, Microsoft generally adheres to those rules in the Win32 API, too, although there are a handful of exceptions in that API. To a degree, they went backwards with the .NET Framework, which seems slavishly to follow in the footsteps of the Java environment that it so eagerly seeks to replace.
I've been using libraries since 1978, and my understanding was and is that the goal of putting code into a library was to make it reusable. While maintaining copies of the library code in each application eliminates the need to implement it again for each new project, it severely complicates upgrading, since you now have ten (or more) copies of the library, each of which must be updated.
If libraries adhere to the rule that an interface is an immutable contract, why shouldn't they live in a shared library directory, as do the Unix system libraries that live in its /lib directory, from which everything that runs on the host shares a single copy of the standard C runtime library, Zlib, and so forth.
Color me seriously disappointed.

Determining what minimal jars are needed for a feature

How do you determine what jars are needed for such and such feature of a framework? For example, what jars would be needed out of all those available for Spring in order to support only dependency injection?

There are tools that create minimal JARs by figuring out which classes are actually used in an application by statically analyzing the code, then creating a new JAR containing only those classes. (I recall using Zelix Classmaster to do this, but there are many alternatives.)
The problem with using these tools for a DI framework like Spring include:
The existing only trace static dependencies. If you dynamically load classes, you have to specifically tell the analyser about each one. DI frameworks in general, and Spring in particular is replete with dynamic loading, including dynamic loading that is opaque to application code.
The existing tools work by creating a new output JAR, not by telling you which of the input JARs are not used. While repackaging the JARs is OK if you are creating a shrink-wrapped application from a closed-source codebase, it is undesirable in general, and potentially problematic with some open-source licenses. Certainly you don't want to do this with Spring.
In theory, someone could write a tool to help. In practice, the tool would need to (for example) know how to extract dynamic class dependencies from Spring configurations expressed in annotations, XML and from bean descriptors created at runtime from higher order configuration (SpringSecurity does this for example). That is a big ask. And even then you have the problem that a "small" change to the wirings made on the installation platform could fail due to a required JARs having been left out by the JAR pruning process.
In my view, the more practical alternatives are:
If you use Maven / Ivy to manage your dependencies, look at the dependency graphs, strip out dependencies that appear to be no longer needed ... and test, test, test.
Manually strip out JARs that appear to be unused ... and test, test, test.
Don't worry about it. A moderate level of unused JAR cruft might add a second or three to deployment and webapp startup times, but that generally doesn't matter. (But if it does ... see above.)

This is why some older Java projects end up having 600 Jars and a 200 MB war file, for a 10,000 line application. Kind of a pain if you don't manage it carefully...

You should really ask the framework provider or read the documentation. Statically analyzing what jars are required might not be enough in some cases(dynamic loading) and sometimes you might end up with too many jars.
I once did some ftp helper stuff to a sort of "utility" library. It depended on some apache ftp jar. If you never used the ftp features in the library you would not need the ftp jar but statical analysis of the code might say you need it. This is something you should documents.

How do you manage embedded configuration files and libraries in java webapps?

I'm currently working on a j2ee project that's been in beta for a while now. Right now we're just hammering out some of the issues with the deployment process. Specifically, there are a number of files embedded in the war (some xml-files and .properties) that need different versions deploying depending on whether you are in a dev, testing or production environment. Stuff like loglevels, connection pools, etc.
So I was wondering how developers here structure their process for deploying webapps. Do you offload as much configuration as you can to the application server? Do you replace the settings files programmatically before deploying? Pick a version during build process? Manually edit the wars?
Also how far do you go in providing dependencies through the application servers' static libraries and how much do you put in the war themselves? All this just to get some ideas of what the common (or perhaps best) practice is at the moment.

I think that if the properties are machine/deployment specific, then they belong on the machine. If I'm going to wrap things up in a war, it should be drop-innable, which means nothing that's specific to the machine it's running on. This idea will break if the war has machine dependent properties in it.
What I like to do is build a project with a properties.example file, each machine has a .properties that lives somewhere the war can access it.
An alternative way would be to have ant tasks, e.g. for dev-war, stage-war, prod-war and have the sets of properties part of the project, baked in in the war-build. I don't like this as much because you're going to end up having things like file locations on an individual server as part of your project build.

I work in an environment where a separate server team performs the configuration of the QA and Production servers for our applications. Each application is generally deployed on two servers in QA and three servers in Production. My dev team has discovered that it is best to minimize the amount of configuration required on the server by putting as much configuration as possible in the war (or ear). This makes server configuration easier and also minimizes the chance that the server team will incorrectly configure the server.
We don't have machine-specific configuration, but we do have environment-specific configuration (Dev, QA, and Production). We have configuration files stored in the war file that are named by environment (ex. dev.properties, qa.properties, prod.properties). We put a -D property on the server VM's java command line to specify the environment (ex. java -Dapp.env=prod ...). The application can look for the app.env system property and use it to determine the name of the properties file to use.
I suppose if you have a small number of machine-specific properties then you could specify them as -D properties as well. Commons Configuration provides an easy way to combine properties files with system properties.
We configure connection pools on the server. We name the connection pool the same for every environment and simply point the servers that are assigned to each environment to the appropriate database. The application only has to know the one connection pool name.

wrt configuration files, I think Steve's answer is the best one so far. I would add the suggestion of making the external files relative to the installation path of the war file - that way you can have multiple installations of the war in the one server with different configurations.
e.g. If my dev.war gets unpacked into /opt/tomcat/webapps/dev, then I would use ServletContext.getRealPath to find the base folder and war folder name, so then the configuration files would live in ../../config/dev relative to the war, or /opt/tomcat/config/dev for absolute.
I also agree with Bill about putting as little as possible in these external configuration files. Using the database or JMX depending on your environment to store as much as it makes sense to. Apache Commons Configuration has a nice object for handling configurations backed by a database table.
Regarding libraries, I agree with unknown to have all the libs in the WEB-INF/lib folder in the war file (self-packaged). The advantage is that each installation of the application is autonomous, and you may have different builds of the war using different versions of the libraries concurrently.
The disadvantage is that it will use more memory as each web application will have its own copy of the classes, loaded by its own class loader.
If this poses a real concern, then you could put the jars in the common library folder for your servlet container ($CATALINA_HOME/lib for tomcat). All installations of your web application running on the same server have to use the same versions of the libraries though. (Actually, that's not strictly true as you could put overriding versions in the individual WEB-INF/lib folder if necessary, but that's getting pretty messy to maintain.)
I would build an automated installer for the common libraries in this case, using InstallShield or NSIS or equivalent for your operating system. Something that can make it easy to tell if you have the most up to date set of libraries, and upgrade, downgrade, etc.

I usually make two properties files:
one for app specifics (messages, internal "magic" words) embedded in the app,
the other for environment specifics (db access, log levels & paths...) exposed on each server's classpath and "sticked" (not delivered with my app). Usually I "mavenise" or "anttise" these one to put specific values, depending on the target env.
Cool guys use JMX to maintain their app conf (conf can be modified in realtime, without redeploying), but it's too complex for my needs.
Server's (static ?) libraries: I strongly discourage server library use in my apps as it adds dependency to the server:
IMO, my app must be "self-packaged": dropping my war, and that's all. I have seen wars with 20 Mbs of jars in it, and that's not disturbing for me.
A common best-practice is to limit your external dependencies to what is offered by the J2EE dogma: the J2EE API (use of Servlets, Ejbs, Jndi, JMX, JMS...). Your app has to be "server agnostic".
Putting dependencies in your app (war, ear, wathever) is self-documenting: you know what libraries your app depends on. With server libs, you have to clearly document these dependencies as they are less obvious (and soon your developers will forget this little magic).
If you upgrade your appserver, chances that the server lib you depends on will also change. AppServer editors are not supposed to maintain compatibility on their internal libs from version to version (and most of the time, they don't).
If you use a widely-used lib embedded in your appServer (jakarta commons logging, aka jcl, comes to mind) and want to ugrade it's version to get the latest features, you take the huge risk that your appServer will not support it.
If you relies on a static server object (in a static field of a server class, e.g. a Map or a log), you'll have to reboot your appserver to clean this object. You loose the ability to hot-redeploy your app (old server object will still exists between redeployments). Using appServer-wide objects (other than those defined by J2EE) can lead to subtle bugs, especially if this object is shared between multiple apps. That's why I strongly discourage the use of objects which resides in a static field of an appServer lib.
If you absolutely need "this object in this appserver's jar", try to copy the jar in your app, hoping there's no dependency on other server's jar, and checking your app's classloading policy (I take the habit to put a "parent last" classloading policy on all my apps: I'm sure I won't be "polluted" by server's jars - but I don't know if it is a "best practice").

I put all configuration in the database. The container (Tomcat, WebSphere, etc) gives me access to the initial database connection and from then on, everything comes out of the database. This allows for multiple environments, clustering, and dynamic changes without downtime (or at least without a redeploy). Especially nice is being able to change the log level on the fly (although you'll need either an admin screen or a background refresher to pick up the changes). Obviously this only works for things that aren't required to get the app started, but generally, you can get to the database pretty quickly after startup.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.