Best way to separate a big dependency from my WAR file

Best way to separate a big dependency from my WAR file - java

I am trying to incorporate JasperReports into an existing Java web project. After managing to get everything to work I found out that my WAR file had grown to quite a scary filesize (> 25MB) because of all of the dependencies. With some tinkering I managed to bring it down to 16MB.
I would like to avoid this huge amount of dependencies. For reasons of filesize, but also because I am slightly worried about the versions of JasperReport's dependencies. JasperReports seems to depend on older versions of many of its dependencies. I'm afraid that at some point this will cause compatibility problems with the rest of my application.
For the moment I have solved this by deploying a separate WAR file that just generates the reports. The original web app communicates with this new one by using Tomcat's crossContext="true" parameter. The two app's servlets communicate by passing a series of attributes using request.setAttribute(arg). The file size might still be there, but at least it's inside of a WAR that I'll hardly touch.
The actual question should probably be something along the lines of: How can I separate JasperReports (or in fact any other big library) from my web app without using the servlet container's common classloader?
Also: am I being overly cautious here?

Related

What are the advantages of jar (war) compression?

You can skip the wall of text and go straight to the questions listed below, if you are so inclined.
Some background:
I'm currently doing some work on a large scale, highly modular Spring application. The application consists of multiple stand-alone Maven projects which are built separately. When compiling the whole application, these projects are pulled in as dependencies and overlaid onto the resulting 'super WAR' file.
The issue:
The build process (shortly) described in the preceding paragraph works well, but is very slow, even when all dependencies are already compiled and can be fetched from the local maven repository.
Some simple testing reveals that build-time of the 'super WAR' is cut in ~half when jar-compression is turned off entirely, at the cost of a comparatively small (~10%) increase in file size.
This is no surprise, really, as the build requires all the dependencies to be built/compressed and later decompressed, overlaid, and then compressed again (as a huge, unified war file).
Adding to this, a fair few of the "sub-projects" are pure web applications which contain no Java code needing compilation (or compression) at all (only static resources).
Questions:
What are the advantages of jar (war, really) compression, except for the (negligibly) reduced file size?
In the case of Java EE or Spring web applications, are there other (performance) issues introduced when turning off compression entirely? I'd think it has the potential to help both build time and JVM-startup.
Any suggestions on how to handle the build process of non-java applications with maven more efficiently are welcome as well. I've considered bundling them as resources, but am not sure how to achieve this while ensuring they are still buildable as stand-alone war files.

Besides the sometimes negligible reduction in the file size and the simplicity of having to manage only one file instead of an entire directory tree, there are still a few advantages:
Reduced copy time, as per this answer: https://superuser.com/a/360532/145340 I can also back this up by personal experience, copying or moving many small files is much slower than copying or moving an equally large single file.
Portability: The JAR file format is clearly defined, leaving no room for incompatible implementations.
Security: You can digitally sign the contents of a JAR file, ensuring the integrity and authenticity of the contents.
Package Sealing: Enforce version consistency, since all classes defined in a package must be found in the same JAR file.
Package Versioning: hold data like like vendor and version information.

how to deal with shared-libraries for many web applications using the same libraries

We have a web application made in Java, which uses struts2, spring and JasperReport. This application runs on glassfish 4.0.
The libraries of the application are in the WEB-INF/lib folder, and also in glassfish are installed 4 more than uses the same libraries.
Glassfish is configured to use 1024mb for heapspace and 512m for permgen, and the most of the memory consumption when i use libraries per application is in the struts actions and spring aop classes (using netbeans profiler).
The problem we are having is the amount of memory consumed by having libraries in the classloader per application because is to high and generates PermGen errors and we have also noticed that the application run slower with more users.
because of that we try to use shared-libraries, put it in domain1/lib folder and found that with a single deployed application the load time and memory consumption is much lower, and the application works faster in general. But when we deploy the rest of the applications on the server only the first application loaded works well and the rest has errors when we calls struts2 actions.
We believe that is because each application has slightly different settings on struts2 and log4j.
We have also tried to put only certain libraries on glassfish and leaving only struts2 in the application but it shows InvocationTargetException errors because all libraries depend the lib from apache-common and it dont matter if we put those lib on one place or another. Also if we put it in both places the application don’t start.
there any special settings or best practices for using shared-libraries?
Is there a way to use shared-libraries but load settings per application? or we have to change the settings to make them all the same?

Is there any special settings or best practices for using shared-libraries? Is there a way to use shared-libraries but load settings per application? or we have to change the settings to make them all the same?
These are actually interesting questions... I don't use GlassFish but, according to the documentation :
Application-Specific Class Loading
[...]
You can specify module- or application-specific library classes [...] Use the asadmin deploy command with the --libraries option and specify comma-separated paths
[...]
Circumventing Class Loader Isolation
Since each application or individually deployed module class loader universe is isolated, an application or module cannot load classes from another application or module. This prevents two similarly named classes in different applications or modules from interfering with each other.
To circumvent this limitation for libraries, utility classes, or individually deployed modules accessed by more than one application, you can include the relevant path to the required classes in one of these ways:
Using the Common Class Loader
Sharing Libraries Across a Cluster
Packaging the Client JAR for One Application in Another Application
Using the Common Class Loader
To use the Common class loader, copy the JAR files into the domain-dir/lib or as-install/lib directory or copy the .class files (and other needed files, such as .properties files) into the domain-dir/lib/classes directory, then restart the server.
Using the Common class loader makes an application or module accessible to all applications or modules deployed on servers that share the same configuration. However, this accessibility does not extend to application clients. For more information, see Using Libraries with Application Clients. [...]
Then I would try:
Solution 1
put all the libraries except Struts2 jars under domain1/lib ,
put only Struts2 jars under domain1/lib/applibs,
then run
$ asadmin deploy --libraries struts2-core-2.3.15.2.jar FooApp1.war
$ asadmin deploy --libraries struts2-core-2.3.15.2.jar FooApp2.war
To isolate Struts2 libraries classloading while keeping the rest under Common Classloader's control.
Solution 2
put all the libraries except Struts2 jars under domain1/lib ,
put only Struts2 jars under domain1/lib/applibs, in different copies with different names, eg appending the _appname at the jar names
then run
$ asadmin deploy --libraries struts2-core-2.3.15.2_FooApp1.jar FooApp1.war
$ asadmin deploy --libraries struts2-core-2.3.15.2_FooApp2.jar FooApp2.war
To prevent sharing of the libraries by istantiating (mock) different versions of them.
Hope that helps, let me know if some of the above works.

You can try to create what is known as a skinny WAR. Pack all your WARs inside an EAR and move all the common JARs from WEB-INF/lib to the lib/ folder in the EAR (don't forget to set <library-directory> in the application.xml).

I'd bet that placing the libs under lib/ or lib/ext won't resolve your performance issues. You did not write anything about the applications or server settings, like size of application, available Heap and PermGen space, but nonetheless I would recommend to stay with separate libs per app.
If you place the libs in server dirs, they will be shared among all apps. You will loose the option to upgrade only one of your applications to a new framework or to get rid away of any of them. Your deployment will be bound to a specific server architecture.
And you wrote it did not solve your problems, it even may raise new ones.
I would recommend to invest some hours into tuning the server. If it runs with defaults, allocate more PermGen and HeapSpace.
If this does not help, you should analyze in deep what's going wrong. Shared libs might be a solution, but you don't know the problem, yet. IBM offer some cool and free tools to analyze heap dumps, this could be a good starting point.

I came here in search of guidance about installing libraries that are shared among multiple applications or projects. I am deeply disappointed to read that the accepted practice favors installing a copy of every shared library into each project. So, if you have ten Web application, all of which use, e. g., httpcomponents-client, mysql-connector-java, etc., then your installation contains ten copies of each.
This behavior reminds me, painfully, of the way of thinking that motivated me to abandon the mainframe in favor of the PC; the thinking seemed to be "I don't care how many resources my application consumes. In fact, I'd like to be able to brag about what a resource hog it is." Excuse me, please, while I hurl.
The interface exposed by a library is an immutable contract that is not subject to change at the developer's whim.
There is this concept called backwards compatibility. If you break it, you create a new interface.
I know of at least two types of interfaces that adhere to the letter and spirit of these rules.
By far the oldest is the IBM System/370 system libraries. You might have Foo and Foo2, where the latter extends and/or breaks the contract made by the Foo interface in some way that made it incompatible.
From its beginnings in the Bell Labs Unix project, the standard C runtime library has adhered to the above rules.
Though it is much newer, the Microsoft COM interface specification enforces the same rule.
To their credit, Microsoft generally adheres to those rules in the Win32 API, too, although there are a handful of exceptions in that API. To a degree, they went backwards with the .NET Framework, which seems slavishly to follow in the footsteps of the Java environment that it so eagerly seeks to replace.
I've been using libraries since 1978, and my understanding was and is that the goal of putting code into a library was to make it reusable. While maintaining copies of the library code in each application eliminates the need to implement it again for each new project, it severely complicates upgrading, since you now have ten (or more) copies of the library, each of which must be updated.
If libraries adhere to the rule that an interface is an immutable contract, why shouldn't they live in a shared library directory, as do the Unix system libraries that live in its /lib directory, from which everything that runs on the host shares a single copy of the standard C runtime library, Zlib, and so forth.
Color me seriously disappointed.

How can one deploy common JavaScript and images to two different EARs while not having to maintain two versions of each file

I have a Struts web app deployed to an EAR that has some pretty extensive JavaScript. I now need to create a new web app that will be deployed to a new EAR but will probably need to share most if not all of the JavaScript and some images from the first application. What's the best way to avoid code duplication so I don't have to put a copy of each JavaScript file in each EAR in my development environment?

You could maintain the Javascript in a separate .jar library and serve it as a resource, not as a static file. That way the JS content would be a regular dependency in your project setup. Unfortunately there isn't a straightforward universal way to do this because you need at the very least a servlet that will send the file from the .jar. (Depending on your web framework you might already have this available.)
This also has some performance implications, but for a line of business application you probably don't need to optimize the load time of your internal Javascripts all that heavily.
Another alternative would be doing this at the source control level, using something like Git submodules.

Custom classloader, JSP execution and resource retrieval inside webapp

Due project requirements, I need to create a webapp that, when executing, will allow some users to upload zip files which are like small apps and will contain .class files, resources (images, css, js, ...) and even lib files. That zip file is almost like a war file.
Any way to code it easily? AFAIK I think I know how to code the custom ClassLoader to load classes from inside the zip file ( Java - Custom ClassLoader - trying to load a class using class file full path ) and even code the resource retrieval when requested by the browser but no idea of how to execute JSP files which will be inside the zip file or load the jar lib files inside the zip file.
EDIT: the webapp must manage applications loaded, there is no way to implement this as answered below because the webapps need the "master" webapp to live. Also that "master" webapp allows versioning of applications. The user will be able to upload a new version and upgrade to it and even do a downgrade if the new version starts to fail.

There is no easy way to do this. It's a lot of work. Classloaders are very finicky beasts. Arguably the bulk of the work of creating something like Tomcat is wrangling the class loaders, the rest is just configuration. And even after all these years, we still have problems.
Tomcat, for example, is very aggressive on how it tries to unload existing webapps, using internal information of the Java class libraries to try and hunt down places for class loader leaks, etc. And despite their efforts, there's still problems.
The latest version of Glassfish has (or will have) the ability to version application deployments. You might have good luck simply hacking on Tomcats internal routing and mapping code to manage versioning.
If you're running an EJB container, you could put your core services in the EJBs and let the WARs talk to them (you could do this with web services in a generic servlet container, but many EJB containers can convert Remote semantics in to Local semantics for calls in to the same container).
You can also look at OSGI. That's another real pain to manage, but it might have enough granularity to even give you versioning, but none of your users will want to use it. Did I mention it's a real pain to manage? We do this for dynamic loading of web content and logic, but we don't version this.
If you must have everything under control of a single WAR, then your best bet is to punt on Java and instead use a scripting language. You tend to have a bit more control over the runtime of scripting environment, particularly if you DON'T let them access arbitrary Java classes.
With this you can upload whatever payload you want, handle all of the dispatch yourself to static resources and logic (which means you get to handle the versioning aspect). Use something like Velocity for your "JSP" pages, and then use Javascript or whatever for logic.
The versioned environment can be pain to pull off. If you don't care about doing it atomically, it's obviously easier. If you can afford "down time" (bring v1 offline then bring up v2), it's a lot easier. If you're uploading the full contents of each version, it's really easy. My system allowed incremental changes and had copy-on-write semantics, so it was a lot harder. But I didn't really want to upload several Gb of media for each version.
The basic takeaway is that when dealing with Classloaders, there be dragons -- nothing is easy with those and there are alternatives that actually get code in to production rather than creating scars and pissed off dragons. Using a scripting language simplifies that immensely. All the rest is dispatch, and that can be done with a filter or servlet.
You WILL get the great joy of reimplementing a solid chunk of the HTTP protocol doing this, that's always a treat as well since the servlet container doesn't really expose that functionality to you. That is, you'll want to do that if you want to be a good citizen on the web. You could always just continually shove content down the clients throat, caching and proxies be damned.

You could manually create a WAR-like structures inside your web container webapps directory and put classes, JARs and JSPs there.
Given that hot redeployment is enabled in your web container it would automatically designate a separate classloader to this new web application that it finds.
In most cases web containers consider any folder having a WEB-INF subfolder containing a valid web.xml file to be a web application. You can restrict access to this new webapp by modifying its context configuration, located in META-INF/context.xml in case of Tomcat.
Controlling hot redeployment, classloader policies etc is dependent on the type of your web container, but I hope your is not worse than Tomcat which could handle all of that.

Determining what minimal jars are needed for a feature

How do you determine what jars are needed for such and such feature of a framework? For example, what jars would be needed out of all those available for Spring in order to support only dependency injection?

There are tools that create minimal JARs by figuring out which classes are actually used in an application by statically analyzing the code, then creating a new JAR containing only those classes. (I recall using Zelix Classmaster to do this, but there are many alternatives.)
The problem with using these tools for a DI framework like Spring include:
The existing only trace static dependencies. If you dynamically load classes, you have to specifically tell the analyser about each one. DI frameworks in general, and Spring in particular is replete with dynamic loading, including dynamic loading that is opaque to application code.
The existing tools work by creating a new output JAR, not by telling you which of the input JARs are not used. While repackaging the JARs is OK if you are creating a shrink-wrapped application from a closed-source codebase, it is undesirable in general, and potentially problematic with some open-source licenses. Certainly you don't want to do this with Spring.
In theory, someone could write a tool to help. In practice, the tool would need to (for example) know how to extract dynamic class dependencies from Spring configurations expressed in annotations, XML and from bean descriptors created at runtime from higher order configuration (SpringSecurity does this for example). That is a big ask. And even then you have the problem that a "small" change to the wirings made on the installation platform could fail due to a required JARs having been left out by the JAR pruning process.
In my view, the more practical alternatives are:
If you use Maven / Ivy to manage your dependencies, look at the dependency graphs, strip out dependencies that appear to be no longer needed ... and test, test, test.
Manually strip out JARs that appear to be unused ... and test, test, test.
Don't worry about it. A moderate level of unused JAR cruft might add a second or three to deployment and webapp startup times, but that generally doesn't matter. (But if it does ... see above.)

This is why some older Java projects end up having 600 Jars and a 200 MB war file, for a 10,000 line application. Kind of a pain if you don't manage it carefully...

You should really ask the framework provider or read the documentation. Statically analyzing what jars are required might not be enough in some cases(dynamic loading) and sometimes you might end up with too many jars.
I once did some ftp helper stuff to a sort of "utility" library. It depended on some apache ftp jar. If you never used the ftp features in the library you would not need the ftp jar but statical analysis of the code might say you need it. This is something you should documents.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.