Ways to dockerize java apps

Ways to dockerize java apps - java

What are the principles for app deployment in docker?
I see two concepts
Create image per app version
Create app binaries and somehow deploy them to container(utilize i.e. Tomcats hot deploy)
Maybe there are others, I personally like the first, but there must me tremendous amount of data, if you release very often. How would one choose one over the another?
I'd like to know how others deploys their java application so I can make my personal opinion.

Update 2019: See "Docker memory limit causes SLUB unable to allocate with large page cache"
I mentioned in "Docker support in Java 8 — finally!" last May (2019), that new evolutions from Java 10, backported in Java 8, means Docker will report more accurately the memory used.
As mbluke adds in the comments:
The resource issues have been addressed in later versions of Java.
As of Java SE 8u131, and in JDK 9, the JVM is Docker-aware with respect to Docker CPU limits transparently.
Starting with Java JDK 8u131+ and JDK 9, there’s an experimental VM option that allows the JVM ergonomics to read the memory values from CGgroups.
To enable it on, you must explicit set the parameters -XX:+UnlockExperimentalVMOptions and -XX:+UseCGroupMemoryLimitForHeap on the JVM Java 10 has these set by default and there is no need for the flags.
January 2018: original answer
As any trade-off, it depends on your situation/release cycle.
But do consider also Java might be ill-fitted for a docker environment in the first place, depending on its nature.
See "Nobody puts Java in a container"
So we have finished developing our JVM based application, and now package it into a docker image and test it locally on our notebook. All works great, so we deploy 10 instances of that container onto our production cluster. All the sudden the application is throttling and not achieving the same performance as we have seen on our test system. And our test system is even this high-performance system with 64 cores…
What has happened?
In order to allow multiple containers to run isolated side-by-side, we have specified it to be limited to one cpu (or the equivalent ratio in CPU shares). Unfortunately, the JVM will see the overall number of cores on that node (64) and use that value to initialize the number of default threads we have seen earlier. As started 10 instances we end up with:
10 * 64 Jit Compiler Threads
10 * 64 Garbage Collection threads
10 * 64 ….
And our application,being limited in the number of cpu cycles it can use, is mostly dealing with switching between different threads and does cannot get any actual work done.
All the sudden the promise of containers, “Package once, run anywhere’ seem violated…
So to be specific, how to cope with the amount of data generated when you do build image per release? If you build your app everytime on top of tomcat image, the disk space needed for store the images will grow quickly, right?
2 techniques:
multi-stage build to make sure your application does not include anything but what is need at runtime (and not any compilation files). See my answer here;
bind mounts: you could simply copy your wars in a volume mounted by a single Tomcat container.

Related

JVM in Docker uses 100% CPU accross multiple cores

I have a Spring Boot app running in Docker which seems to struggle with its processing, I'll need to fix it.
Anyway, to get an idea of where the bottleneck is I made a simple top, I see that my Java process uses 100% CPU on a 4 cores machine. Good enough, I guess I need to parallelize some expensive actions in order to spread across multiple cores.
The thing is even if my main Java process seems to max out around 100%, machine wise I see that all 4 cores are used around 25%.
I'm clearly not an expert in Docker or JVM but I have to do something about it :/
To me, it looks like my JVM only see 1 core but docker manages to spread the work accross all cores.
Any thoughts about what might be going on ?
Oh and about the versions, it's running Docker 17.05, JDK 7. I might update Docker but not Java :(

I faced such an issue on Docker on AWS EC2 with 64 cores. The problem was that only one core was visible when calling Java with no options. All cores were visible if I used -XX:ActiveProcessorCount or -XX:-UseContainerSupport options. But in the latter case each of the cores is used less than 2-3% summing all together to about 100%. After a long search I found that tools like htop can see all physical cores from the container but there could be constraints limiting the number of available CPU capacity. For instance, the option --cpu-shares. One can check its value from within the container with cat /sys/fs/cgroup/cpu/cpu.shares. The 1024 points correspond to 1 core. For example one can set --cpu-shares 716 and only 70% of the core will be available from the container. This was my case. In your case the number of physical processors is 4 and probably cpu.shares has 1024 points. Thus, you load 25% of every core.
Useful links for reference:
JVM in container calculates processors wrongly?
https://bugs.openjdk.org/browse/JDK-8146115
https://docs.docker.com/config/containers/resource_constraints/

Should each Docker image contain a JDK?

So, I'm very new to Docker. Let me explain the context to the question.
I have 10 - 20 Spring Boot micro-service applications, each running on different ports on my local machine.
But for migrating to Docker, based on my learning, each of the services must be in a different Docker container so as to quickly deploy or make copies.
For each Docker container, we need to create a new Docker image.
Each Docker image must contain a JRE for the Spring Boot application to run. It is around 200 MB maximum. That means each docker image is, say 350 MB at the maximum.
On the other hand, on my local PC I have only one JRE of 200 MB and each application takes only a few MB of space.
Based on this, I would need 600 MB on my local system, yet need 7 GB for all Docker images.
Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?
Why is the size of the image large even if the target PC may already have the JDK?

Your understanding is not correct.
Docker images are formed with layers; see next diagram:
When you install a JRE in your image, let's suppose its checksum is 91e54dfb1179 in the next picture, it will occupy your disk really.
But, if all your containers are then all based on the same image, and add different things, says, your different microservice application to the thin R/W layer, all containers will share the 91e54dfb1179, so it will not be the n*m relationship.
You need to pay attention to using the same base image for all Java applications as much as possible, and add different things to the thin R/W layer.

The other answers cover Docker layering pretty well, so I just want to add details for you questions
Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?
Yes. If it's not in the image, it won't be in the container. You can save disk space though by reusing as many Layers as possible. So try to write your Dockerfile from "Least likely to change" to "Most likely to change". So when you build your image, the more often you see "Using cache", the better.
Why is the size of the image large even if the target PC may already have the JDK?
Docker wants as little to do with the host as possible. Docker doesn't even want to deal with the host. The first thing it does is create a VM to hide in. Docker images assume the only thing the host will give is empty ram, disk, and CPUs. So each Docker image must also contain it's own OS/kernel. (That is what your initial FROM is doing, picking a base OS image to use) So your final image size is actually OS + tools + app. Image size is a little misleading though, as it is the sum of all layers, which are reused across images.
(Implied) Should each app/micro-service be in its own container?
Ideally, yes. By converting your app into an isolated module, it makes it easier to replace/load-balance that module.
In practice, maybe not (for you). Spring Boot is not a light framework. In fact, it is a framework for module-izing your code (Effectively running a module control system inside a module control system). And now you want to host 10-20 of them? That is probably not going to be able to run on a single server. Docker will force Spring boot to load itself into memory per app; and objects can't be reused across modules now, so those need to be multi-instantiated too! And if you are restricted to 1 production server, horizontal scaling isn't an option. (You will need ~1GB of HEAP (RAM) per Spring Boot, mileage my very based on your code base). And with 10-20 apps, refactoring to make the app lighter for Docker deployment may not be feasible/in-budget. Not to mention, if you can't run a minimal setup locally for testing (insufficient RAM), development effort will get a lot more "fun".
Docker is not a golden hammer. Give it a try, evaluate the pros and cons yourself, and decide if the pros are worth the cons for you and your team(s).

Lagom's answer is great, but I'd like to add that the size of Docker containers should be as small as reasonably possible to ease transfer and storage.
Hence, there are a lot of containers based on the Alpine Linux distribution, which are really small. Try to use them if possible.
Furthermore, do not add every tool imaginable to your container, e.g. you can often do without wget...

Based on this, I would need 600 MB on my local system, yet need 7 GB
for all Docker images.
Is this approach correct? Should "OpenJDK" from DockerHub be added to
each image?
That is correct. While you could wonder if a JRE is not enough.
Why is the size of the image large even if the target PC may already
have the JDK?
You compare things that are not comparable : local environment(that is all but a production machine) VS integration/production environments.
In integration/production environment, the load of your applications may be high and the isolation between applications is generally advised. So here, you want to host a minimal number of application (ui/services) by machine (bare, VM or container) to prevent side effects between application : shared libraries incompatibility, software upgrade side effects, resource starving, chained failures between applications...
While in local environment, the load of your applications is quite low and the isolation between applications is generally not a serious issue. So here you can host multiple applications (ui/services) on your local machine and you can also share some common libraries/dependencies provided by the OS.
While you can do that, is really a good practice to mix and share everything in local ?
I don't think because :
1) the local machine is not a bin : you work on that the whole day. More that is clean more you development is efficient. For example : JDK/JRE may differ between applications hosted in local, some folders used by the application may have the same location, the database version may differ, applications can have different installed java server (Tomcat, Netty, Weblogic) and or with different versions...
Thanks to container, that is not an issue : all is installed and removed according to your requirements.
2) environments (from local to prod) should as close as possible to ease the whole integration-deployment chain and to detect issues early and not only in production.
As a side note, to achieve that in local you need a real machine for developer.
All has a cost but actually that is not expensive
Besides isolation (hardware and software resources), containers bring other advantages as fast deploy/undeploy, scalability and failover friendly (for example : Kubernetes relies on container).
Isolation, fastness, scalability and robustness friendly have a cost: to not share physically any resource between containers (OS, libraries, JVM, ...).
That means that even if you use the exact OS, libraries, JVM in your applications, each application will have to include them in their image.
Is it expensive ?
Not really : official images relies often on Alpine (light Linux OS with limitations but customizable if needed) and what represent a image of 350 MB (value that you quote is that is in the reality) in terms of cost ?
In fact, that is really cheap.
In integration/production, all your services will very probably not been hosted on the same machine, so compare the 350 MB for a container to resources used in traditional VMs for integration/production that contain a complete OS with multiple additional programs installed on. You understand that the resource consumption of containers is not issue. That is even considered as an advantage beyond local environments.

Java - issue with memory

Need some help from the experts!
We have a project here (still on dev) that needs to run 50 java processes (for now and it will probably doubled or tripled in the future) at the same time every 5 minutes. I set Xmx50m for every process and our server has only 4gb of RAM, I know that would really slow our server. What I have in mind is to upgrade our RAM. My question is that do I have other options to prevent our server from being slow when running that amount of java processes?

Since you have 50 process and as per your assumption your processes need about 2.5 Gb to run .
To prevent your server from being slow you can follow some best practices to set java memory parameters e.g. set -Xmin and -Xmx the same values and determine a proper values based on your process usage, Also you can profile your process on runtime to ensure that everything is ok.

How to handle thousands of threads in Java without using the new java.util.concurrent package

I have a situation in which I need to create thousands of instances of a class from third party API. Each new instance creates a new thread. I start getting OutOfMemoryError once threads are more than 1000. But my application requires creating 30,000 instances. Each instance is active all the time. The application is deployed on a 64 bit linux box with 8gb RAM and only 2 gb available to my application.
The way the third party library works, I cannot use the new Executor framework or thread pooling.
So how can I solve this problem?
Note that using thread pool is not an option. All threads are running all the time to capture events.
Sine memory size on the linux box is not in my control but if I had the choice to have 25GB available to my application in a 32GB system, would that solve my problem or JVM would still choke up?
Are there some optimal Java settings for the above scenario ?
The system uses Oracle Java 1.6 64 bit.

I concur with Ryan's Answer. But the problem is worse than his analysis suggests.
Hotspot JVMs have a hard-wired minimum stack size - 128k for Java 6 and 160k for Java 7.
That means that even if you set the stack size to the smallest possible value, you'd need to use roughly twice your allocated space ... just for thread stacks.
In addition, having 30k native threads is liable to cause problems on some operating systems.
I put it to you that your task is impossible. You need to find an alternative design that does not require you to have 30k threads simultaneously. Alternatively, you need a much larger machine to run the application.
Reference: http://mail.openjdk.java.net/pipermail/hotspot-runtime-dev/2012-June/003867.html

I'd say give up now and figure another way to do it. Default stack size is 512K. At 30k threads, that's 15G in stack space alone. To fit into 2G, you'll need to cut it down to less than 64K stacks, and that leaves you with zero memory for the heap, including all the Thread objects, or the JVM itself.
And that's just the most obvious problem you're likely to run into when running that many simultaneous threads in one JVM.

I think we are missing lots of details, but would a distributed plateform would work? Each of individual instances would manage a range of your classes instances. Those plateform could be running on different pcs or virtual machines and communicate with each other?

I had the same problem with an SNMP provider that required a thread for each outstanding get (I wanted to have tens of thousands of outstanding gets going on at once). Now that NIO exists I'd just rewrite the library myself if I had to do this again.
You cannot solve it in "Java Code" or configuration. Windows chokes at around 2-3000 threads in my experience (this may have changed in later versions). When I was doing this I surprisingly found that Linux supported even less threads (around 1000).
When the system stops supplying threads, "Out of Memory" is the exception you should expect to see--so I'm sure that's it--I started getting this exception long before I ran out of memory. Perhaps you could hack linux somehow to support more, but I have no idea how.
Using the concurrent package will not help here. If you could switch over to "Green" threads it might, but that might take recompiling the JVM (it would be nice if it was available as a command line switch, but I really don't think it is).

Java application performance changing based on how it is executed

hopefully this is an easy and quick question. I recently developed a CPU intensive java application in Netbeans. It uses A* pathfinding tens of thousands of times per second to solve a tiles matching game. The application is finished, and it runs pretty fast (I've been testing in netbeans the whole time). I've clocked it at 700 attempts per second (each attempt is probably 20 or so pathfinds). When I build the project it creates a jar, and I can run this outside of netbeans. If I use the command line (Windows 7), and use java -jar theFile.jar, I clock it at 1000 attempts per second. This is understandable since the IDE was probably using a bit of cpu power and holding it back (My application is multicored, you can set the number. I usually use 3/4 so it doesnt slow my system too much). Now, the confusing part. Obviously I don't want the user to have to use the command line every time they want to run this application on windows. They should just be able to click the jar. The problem is that when I double click the jar file, the program runs at a sickly 300 attempts per second!!
Why on earth would these three ways of running the exact same program, all else being constant, have such a massive impact on performance? Is my fix to create a script to run the .jar by command line, or do you guys recognize what's going on here? Thanks very much!
Edit: New Information
I made a batch file with the command: java -jar theFile.jar
When this is executed, it runs at the same speed as it would if I ran it in the console (so, 1000 att/sec)
However, I also made an executable with a simple c++ program. The program had just a couple lines, and was System("java -jar theFile.jar"); and return 0;. Unbeleivably, this runs at the speed of double clicking the jar file, about 300att/sec. How bizarre! It could very well be different IDE parameters, but i'm not sure how to check the default system parameters, or how to modify them for this particular jar.

You may be running into the differences between the client and server versions of the HotSpot VM. From this article:
On platforms typically used for client applications, the JDK comes with a VM implementation called the Java HotSpot™ Client VM (client
VM). The client VM is tuned for reducing start-up time and memory
footprint. It can be invoked by using the -client command-line option
when launching an application.
On all platforms, the JDK comes with an implementation of the Java virtual machine called the Java HotSpot Server VM (server VM). The
server VM is designed for maximum program execution speed. It can be
invoked by using the -server command-line option when launching an
application.
I'm guessing that clicking the jar file may be invoking the client VM, unless you set the -server flag. This article provides some more details:
What's the difference between the -client and -server systems?
These two systems are different binaries. They are essentially two
different compilers (JITs)interfacing to the same runtime system. The
client system is optimal for applications which need fast startup
times or small footprints, the server system is optimal for
applications where the overall performance is most important. In
general the client system is better suited for interactive
applications such as GUIs. Some of the other differences include the
compilation policy,heap defaults, and inlining policy.
Where do I get the server and client systems?
Client and server systems are both downloaded with the 32-bit Solaris
and Linux downloads. For 32-bit Windows, if you download the JRE, you
get only the client, you'll need to download the SDK to get both
systems.
For 64-bit, only the server system is included. On Solaris, the 64-bit
JRE is an overlay on top of the 32-bit distribution. However, on Linux
and Windows, it's a completely separate distribution.
I would like java to default to -server. I have a lot of scripts which
I cannot change (or do not want to change). Is there any way to do
this?
Since Java SE 5.0, with the exception of 32-bit Windows, the server VM
will automatically be selected on server-class machines. The
definition of a server-class machine may change from release to
release, so please check the appropriate ergonomics document for the
definition for your release. For 5.0, it's Ergonomics in the 5.0
Java[tm] Virtual Machine.
Should I warm up my loops first so that Hotspot will compile them?
Warming up loops for HotSpot is not necessary. HotSpot contains On
Stack Replacement technology which will compile a running
(interpreted) method and replace it while it is still running in a
loop. No need to waste your applications time warming up seemingly
infinite (or very long running) loops in order to get better
application performance.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.