Should each Docker image contain a JDK?

Should each Docker image contain a JDK? - java

So, I'm very new to Docker. Let me explain the context to the question.
I have 10 - 20 Spring Boot micro-service applications, each running on different ports on my local machine.
But for migrating to Docker, based on my learning, each of the services must be in a different Docker container so as to quickly deploy or make copies.
For each Docker container, we need to create a new Docker image.
Each Docker image must contain a JRE for the Spring Boot application to run. It is around 200 MB maximum. That means each docker image is, say 350 MB at the maximum.
On the other hand, on my local PC I have only one JRE of 200 MB and each application takes only a few MB of space.
Based on this, I would need 600 MB on my local system, yet need 7 GB for all Docker images.
Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?
Why is the size of the image large even if the target PC may already have the JDK?

Your understanding is not correct.
Docker images are formed with layers; see next diagram:
When you install a JRE in your image, let's suppose its checksum is 91e54dfb1179 in the next picture, it will occupy your disk really.
But, if all your containers are then all based on the same image, and add different things, says, your different microservice application to the thin R/W layer, all containers will share the 91e54dfb1179, so it will not be the n*m relationship.
You need to pay attention to using the same base image for all Java applications as much as possible, and add different things to the thin R/W layer.

The other answers cover Docker layering pretty well, so I just want to add details for you questions
Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?
Yes. If it's not in the image, it won't be in the container. You can save disk space though by reusing as many Layers as possible. So try to write your Dockerfile from "Least likely to change" to "Most likely to change". So when you build your image, the more often you see "Using cache", the better.
Why is the size of the image large even if the target PC may already have the JDK?
Docker wants as little to do with the host as possible. Docker doesn't even want to deal with the host. The first thing it does is create a VM to hide in. Docker images assume the only thing the host will give is empty ram, disk, and CPUs. So each Docker image must also contain it's own OS/kernel. (That is what your initial FROM is doing, picking a base OS image to use) So your final image size is actually OS + tools + app. Image size is a little misleading though, as it is the sum of all layers, which are reused across images.
(Implied) Should each app/micro-service be in its own container?
Ideally, yes. By converting your app into an isolated module, it makes it easier to replace/load-balance that module.
In practice, maybe not (for you). Spring Boot is not a light framework. In fact, it is a framework for module-izing your code (Effectively running a module control system inside a module control system). And now you want to host 10-20 of them? That is probably not going to be able to run on a single server. Docker will force Spring boot to load itself into memory per app; and objects can't be reused across modules now, so those need to be multi-instantiated too! And if you are restricted to 1 production server, horizontal scaling isn't an option. (You will need ~1GB of HEAP (RAM) per Spring Boot, mileage my very based on your code base). And with 10-20 apps, refactoring to make the app lighter for Docker deployment may not be feasible/in-budget. Not to mention, if you can't run a minimal setup locally for testing (insufficient RAM), development effort will get a lot more "fun".
Docker is not a golden hammer. Give it a try, evaluate the pros and cons yourself, and decide if the pros are worth the cons for you and your team(s).

Lagom's answer is great, but I'd like to add that the size of Docker containers should be as small as reasonably possible to ease transfer and storage.
Hence, there are a lot of containers based on the Alpine Linux distribution, which are really small. Try to use them if possible.
Furthermore, do not add every tool imaginable to your container, e.g. you can often do without wget...

Based on this, I would need 600 MB on my local system, yet need 7 GB
for all Docker images.
Is this approach correct? Should "OpenJDK" from DockerHub be added to
each image?
That is correct. While you could wonder if a JRE is not enough.
Why is the size of the image large even if the target PC may already
have the JDK?
You compare things that are not comparable : local environment(that is all but a production machine) VS integration/production environments.
In integration/production environment, the load of your applications may be high and the isolation between applications is generally advised. So here, you want to host a minimal number of application (ui/services) by machine (bare, VM or container) to prevent side effects between application : shared libraries incompatibility, software upgrade side effects, resource starving, chained failures between applications...
While in local environment, the load of your applications is quite low and the isolation between applications is generally not a serious issue. So here you can host multiple applications (ui/services) on your local machine and you can also share some common libraries/dependencies provided by the OS.
While you can do that, is really a good practice to mix and share everything in local ?
I don't think because :
1) the local machine is not a bin : you work on that the whole day. More that is clean more you development is efficient. For example : JDK/JRE may differ between applications hosted in local, some folders used by the application may have the same location, the database version may differ, applications can have different installed java server (Tomcat, Netty, Weblogic) and or with different versions...
Thanks to container, that is not an issue : all is installed and removed according to your requirements.
2) environments (from local to prod) should as close as possible to ease the whole integration-deployment chain and to detect issues early and not only in production.
As a side note, to achieve that in local you need a real machine for developer.
All has a cost but actually that is not expensive
Besides isolation (hardware and software resources), containers bring other advantages as fast deploy/undeploy, scalability and failover friendly (for example : Kubernetes relies on container).
Isolation, fastness, scalability and robustness friendly have a cost: to not share physically any resource between containers (OS, libraries, JVM, ...).
That means that even if you use the exact OS, libraries, JVM in your applications, each application will have to include them in their image.
Is it expensive ?
Not really : official images relies often on Alpine (light Linux OS with limitations but customizable if needed) and what represent a image of 350 MB (value that you quote is that is in the reality) in terms of cost ?
In fact, that is really cheap.
In integration/production, all your services will very probably not been hosted on the same machine, so compare the 350 MB for a container to resources used in traditional VMs for integration/production that contain a complete OS with multiple additional programs installed on. You understand that the resource consumption of containers is not issue. That is even considered as an advantage beyond local environments.

Related

Multi-WAR tomcat vs Docker containers

I'm wondering if a Docker solution is faster and more memory efficient than my current Tomcat deployment. I will explain both solutions.
The current:
I have a Tomcat server with about 20 WAR's deployed. The WAR's are Spring Boot applications. It takes up a lot of memory and boottime and money too.
The docker alternative:
The alternative I'm thinking about is a docker host with 20 docker containers, one for each app. It seems Spring recommends using JAR's on JDK images.
Now, does Docker, or containerization in general, improve memory and speed?
One improvement I am expecting is that applications can start in parallel. This will hopefully speed up boot-time (assuming multi-core hardware). Am I right here?
Secondly I'm wondering which approach will handle memory most efficient.
What happens when I have multiple WAR's, sharing the exact same dependency? Will Tomcat reuse dependency memory for that? And will Docker?

Memory (and thus likely CPU) efficiency can be debated and probably needs to be measured. Let me give some insight.
Let's assume you create 20 containers, one for each of the war's you want to run. At that time you have 20 different JVMs in memory. Depending whether they come from the same container image or from different ones, the OS recognizes they are the same, and the codebase could be shared. So this depends on whether you bake your wars into the container images or have one image only and mount the wars at runtime.
What about permgen space, heap or other memory regions? I doubt the OS can share much between the processes here. And the JVMs cannot share on their level since the docker container isolation would not allow them to talk to each other. So shared memory on JVM level is lost.
With that, every JVM would start up and run the JIT for hotspot code locations, and no synergy between the applications can be used. With a bigger codebase in memory, also the CPU would have to jump more between processes, invalidating the cache more often.
All in all I believe dockerizing your setup is an improvement in application isolation. You can more easily install/uninstall your stuff, and one application running havoc cannot impact the others. But performance-wise, you should notice lower execution times and higher memory usage. To what extent might only be benchmarked.

Ways to dockerize java apps

What are the principles for app deployment in docker?
I see two concepts
Create image per app version
Create app binaries and somehow deploy them to container(utilize i.e. Tomcats hot deploy)
Maybe there are others, I personally like the first, but there must me tremendous amount of data, if you release very often. How would one choose one over the another?
I'd like to know how others deploys their java application so I can make my personal opinion.

Update 2019: See "Docker memory limit causes SLUB unable to allocate with large page cache"
I mentioned in "Docker support in Java 8 — finally!" last May (2019), that new evolutions from Java 10, backported in Java 8, means Docker will report more accurately the memory used.
As mbluke adds in the comments:
The resource issues have been addressed in later versions of Java.
As of Java SE 8u131, and in JDK 9, the JVM is Docker-aware with respect to Docker CPU limits transparently.
Starting with Java JDK 8u131+ and JDK 9, there’s an experimental VM option that allows the JVM ergonomics to read the memory values from CGgroups.
To enable it on, you must explicit set the parameters -XX:+UnlockExperimentalVMOptions and -XX:+UseCGroupMemoryLimitForHeap on the JVM Java 10 has these set by default and there is no need for the flags.
January 2018: original answer
As any trade-off, it depends on your situation/release cycle.
But do consider also Java might be ill-fitted for a docker environment in the first place, depending on its nature.
See "Nobody puts Java in a container"
So we have finished developing our JVM based application, and now package it into a docker image and test it locally on our notebook. All works great, so we deploy 10 instances of that container onto our production cluster. All the sudden the application is throttling and not achieving the same performance as we have seen on our test system. And our test system is even this high-performance system with 64 cores…
What has happened?
In order to allow multiple containers to run isolated side-by-side, we have specified it to be limited to one cpu (or the equivalent ratio in CPU shares). Unfortunately, the JVM will see the overall number of cores on that node (64) and use that value to initialize the number of default threads we have seen earlier. As started 10 instances we end up with:
10 * 64 Jit Compiler Threads
10 * 64 Garbage Collection threads
10 * 64 ….
And our application,being limited in the number of cpu cycles it can use, is mostly dealing with switching between different threads and does cannot get any actual work done.
All the sudden the promise of containers, “Package once, run anywhere’ seem violated…
So to be specific, how to cope with the amount of data generated when you do build image per release? If you build your app everytime on top of tomcat image, the disk space needed for store the images will grow quickly, right?
2 techniques:
multi-stage build to make sure your application does not include anything but what is need at runtime (and not any compilation files). See my answer here;
bind mounts: you could simply copy your wars in a volume mounted by a single Tomcat container.

Running portlets on Liferay on 1Gig Server - Performance Issue

We have a couple of custom portlet applications running inside Liferay Portal.
The solution is installed on client’s computer which is entry-level (RAM <= 1 Giga). Due to red tape, it is rather unlikely the client switches to higher-end computers in the short term.
The issue is that the applications are very slow.
What are the hints to optimize Liferay configuration (or optimize the portlet application) so we are able to run decently on entry-level computers?
Or is it a good move to switch the portlets to lighter Portlets Containers alternatives such as Apache Pluto or GateIn?
Or running a portal like Liferay on entry-level computers is not an option? And we should consider porting the existing portlets to separate standard Java Web Applications so to achieve better performance?

Compare the price of tuning, minimizing the footprint and measuring the result to the price of just 1 more Gigabyte of RAM - which you might not even be able to purchase in this size any more.
Then compare the price for porting from a portal environment into Java Web Applications: You can't even be sure that this will result in a lower footprint, as you'll have to redo quite a bit of functionality that Liferay provides out of the box. Identity Management for example. Content Management as another one. This will take time (equaling money) that might be better spent with just a new server.
For ~40€/month you can get a hosted server, including network connectivity, power and even support, that is way more capable of serving an application like this than a server the size of a Raspberry Pi (<40€ total, I've seen Raspberry Pi hosting for less than 40€ per year).
I don't know what you mean with "Red Tape", but I'd say you're definitely going for the wrong target. While there is a point to tune Liferay, I'd not go for this kind of optimization.
You're not mentioning the version you're using - with that hardware I'm assuming that it's an ancient version. Back before the current version, Liferay was largely monolithic. While you can configure quite a bit (cache, deactivate some functionality) they'll not bring drastic advantages. The current version has been modularized and you can remove components that you don't use, lowering the footprint - however, it's not been built for that size of infrastructure.
And when you're running the portal on that kind of hardware, you're not running the database and an extra webserver on the same box as well, right? This would be the first thing to change: Minimize everything that's running outside of Liferay on the same OS/Box.

Scalability of a single server for running a Java Web application

I want to gain more insight regarding the scale of workload a single-server Java Web application deployed to a single Tomcat instance can handle. In particular, let's pretend that I am developing a Wiki application that has a similar usage pattern like Wikipedia. How many simultaneous requests can my server handle reliably before going out of memory or show signs of excess stress if I deploy it on a machine with the following configuration:
4-Core high-end Intel Xeon CPU
8GB RAM
2 HDDs in RAID-1 (No SSDs, no PCIe based Solid State storages)
RedHat or Centos Linux (64-bit)
Java 6 (64-bit)
MySQL 5.1 / InnoDB
Also let's assume that the MySQL DB is installed on the same machine as Tomcat and that all the Wiki data are stored inside the DB. Furthermore, let's pretend that the Java application is built on top of the following stack:
SpringMVC for the front-end
Hibernate/JPA for persistence
Spring for DI and Security, etc.
If you haven't used the exact configuration but have experience in evaluating the scalability of a similar architecture, I would be very interested in hearing about that as well.
Thanks in advance.
EDIT: I think I have not articulated my question properly. I mark the answer with the most up votes as the best answer and I'll rewrite my question in the community wiki area. In short, I just wanted to learn about your experiences on the scale of workload your Java application has been able to handle on one physical server as well as some description regarding the type and architecture of the application itself.

You will need to use group of tools :
Loadtesting Tool - JMeter can be used.
Monitoring Tool - This tool will be used to monitor various numbers of resources load. There are Lot paid as well as free ones. Jprofiler,visualvm,etc
Collection and reporting tool. (Not used any tool)
With above tools you can find optimal value. I would approach it in following way.
will get to know what should be ratio of pages being accessed. What are background processes and their frequency.
Configure my JMeter accordingly (for ratios) , and monitor performance for load applied ( time to serve page ...can be done in JMeter), monitor other resources using Monitor tool. Also check count of error ratio. (NOTE: you need to decide upon what error ratio is not acceptable.)
Keep increasing Load step by step and keep writting various numbers of interest till server fails completely.
You can decide upon optimal value based on many criterias, Low error rate, Max serving time etc.
JMeter supports lot of ways to apply load.

To be honest, it's almost impossible to say. There's probably about 3 ways (of the top of my head to build such a system) and each would have fairly different performance characteristics. You best bet is to build and test.
Firstly try to get some idea of what the estimated volumes you'll have and the latency constraints that you'll need to meet.
Come up with a basic architecture and implement a thin slice end to end through the system (ideally the most common use case). Use a load testing tool like (Grinder or Apache JMeter) to inject load and start measuring the performance. If the performance is acceptable - be conservative your simple implementation will likely include less functionality and be faster than the full system - continue building the system and testing to make sure you don't introduce a major performance bottleneck. If not come up with a different design.
If your code is reasonable the bottleneck will likely be the database and somewhere in the region 100s of db ops per second. If that is insufficient then you may need to think about caching.

Definitely take a look at Spring Insight for performance monitoring and analysis.

English Wikipedia has 14GB data. A 8GB mem cache would have very high hit/miss ratio, and I think harddisk read would be well within its capacity. Therefore, the app is most likely network bound.
English Wikipedia has about 3000 page views per second. It is possible that tomcat can handle the load by careful tuning, and the network has enough throughput to server the traffic.
So the entire wikipedia site can be hosted on one moderate machine? Probably not. Just an idea.
-
http://stats.wikimedia.org/EN/TablesWikipediaEN.htm
http://stats.wikimedia.org/EN/TablesPageViewsMonthly.htm

Tomcat doesn't allow for spreading over multiple machines. If you really are concerned about scalability, you must consider what to do when your application outgrows a single machine.

Does a war file size affect in some way the application and/or application server performance?

we've bean struggling here at work by somebody suggestion that we should decrease the size of our war file, specifically the WEB-INF/lib directory size, in order to improve our production JBoss instance performance. Something I'm still suspicious about.
We have around 15 web apps deploy in our application server, each about 15 to 20 MB in size.
I know there are a lot of variables involved on this, but has anyone of you actually deal with this situation? Does the .war files size actually has a significant impact on web containers in general?
What advice can you offer?
Thank U.

There are many things to be suspicious of here:
What about the application is not performing to the level you would like?
Have you measured the application to find out which components are causing the lack of performance?
What are the bottlenecks in the application/system?
The size of the application alone has nothing to do with any sort of runtime performance. The number of classes loaded during the lifetime of the application has an impact on memory usage of the application, but an incredibly negligible one.
When dealing with "performance issues", the solution always follows the same general steps:
What does it mean when we say "bad performance"?
What specifically is not performing? Measure, measure, measure.
Can we improve the specific component not performing to the level we want?
If so, implement the ideas, measure again to find out if performance has truly improved.

Need you to tell us the operating system.
Do you have antivirus live protection?
A war/jar file is actually a zip file - i.e., if you renamed a .war to a .zip, you can use a zip utility to view/unzip it.
During deployment, the war file is unzipped once into a designated folder. If you have live-protection, the antivirus utility might take some time to scan the new branch of directories created and slow down any access to them.
Many web app frameworks, like JSPs, create temporary files and your live-protection would get into action to scan them.
If this is your situation, you have to decide whether you wish to exclude your web-app from antivirus live-scanning.
Are you running Linux but your web directory is accessed using ntfs-3g? If so, check if the ntfs directory is compressed. ntfs-3g has problems accessing compressed ntfs files especially when multiple files are manipulated/created/uncompressed simultaneously. In the first place, unless there are some extremely valid reasons (which I can't see any), a web app directory should be a local partition in a format native to Linux.
Use wireshark to monitor the network activity. Find out if web apps are causing accesses to remote file systems. See if there are too many retransmits whenever the web apps are active. Excessive retransmits or requests for retransmits means the network pipeline has integrity problems. I am still trying to understand this issue myself - some network cards have buffering problems (as though buffer overflow) operating in Linux but not in Windows.
Wireshark is not difficult to use as long as you have an understanding of ip addresses, and you might wish to write awk, perl or python scripts to analyze the traffic. Personally, I would use SAS.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.