Docker volume mapping + windows = incredible slow? - java

I am currently trying to copy data from my postgres, which runs within a docker-container, to my windows host. For this purpose I implemented a java application (also within a docker container) that uses the postgres-jdbc driver and its CopyManager in order to copy specific data to the host in mapped volume.
Problem: When I copy the data to the mapped windows directory, it becomes very slow. (Writing 1 GB of data takes about 40 minutes - without volume mapping only 1 minute)
Docker-compose:
exportservice:
build: ./services/exportservice
volumes:
- samplePath:/export_data
I have already read that it's a known problem, but I haven't found a suitable solution.
My services have to run in a production environment that is based on Windows.
So what's the way to solve this issue? WSL2?
Looking forward to your advice!

Mounting a Windows folder into a Docker container is always slow no matter how you do it. WSL2 is even slower than WSL1 in that respect.
The best solution is to install WSL2, copy all your project files into the Linux file system (mounted in Windows at \\wsl$\<distro>\), run containers from there and mount Linux directories accordingly. That bypasses any Windows file interaction.
I wrote a Docker for Web Developers book and video course because I couldn't find good getting-started tutorials which explained how to create local development environments. It includes Hyper-V and WSL2 instructions and advice. Use the discount code dock30 for 30% off.

Use WSL2 instead of WSL and use the Linux file system. But you could also reduce your write cycles in order to reduce the writing overhead. This could be achieved by using a BufferedWriter in Java.

Related

Should each Docker image contain a JDK?

So, I'm very new to Docker. Let me explain the context to the question.
I have 10 - 20 Spring Boot micro-service applications, each running on different ports on my local machine.
But for migrating to Docker, based on my learning, each of the services must be in a different Docker container so as to quickly deploy or make copies.
For each Docker container, we need to create a new Docker image.
Each Docker image must contain a JRE for the Spring Boot application to run. It is around 200 MB maximum. That means each docker image is, say 350 MB at the maximum.
On the other hand, on my local PC I have only one JRE of 200 MB and each application takes only a few MB of space.
Based on this, I would need 600 MB on my local system, yet need 7 GB for all Docker images.
Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?
Why is the size of the image large even if the target PC may already have the JDK?
Your understanding is not correct.
Docker images are formed with layers; see next diagram:
When you install a JRE in your image, let's suppose its checksum is 91e54dfb1179 in the next picture, it will occupy your disk really.
But, if all your containers are then all based on the same image, and add different things, says, your different microservice application to the thin R/W layer, all containers will share the 91e54dfb1179, so it will not be the n*m relationship.
You need to pay attention to using the same base image for all Java applications as much as possible, and add different things to the thin R/W layer.
The other answers cover Docker layering pretty well, so I just want to add details for you questions
Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?
Yes. If it's not in the image, it won't be in the container. You can save disk space though by reusing as many Layers as possible. So try to write your Dockerfile from "Least likely to change" to "Most likely to change". So when you build your image, the more often you see "Using cache", the better.
Why is the size of the image large even if the target PC may already have the JDK?
Docker wants as little to do with the host as possible. Docker doesn't even want to deal with the host. The first thing it does is create a VM to hide in. Docker images assume the only thing the host will give is empty ram, disk, and CPUs. So each Docker image must also contain it's own OS/kernel. (That is what your initial FROM is doing, picking a base OS image to use) So your final image size is actually OS + tools + app. Image size is a little misleading though, as it is the sum of all layers, which are reused across images.
(Implied) Should each app/micro-service be in its own container?
Ideally, yes. By converting your app into an isolated module, it makes it easier to replace/load-balance that module.
In practice, maybe not (for you). Spring Boot is not a light framework. In fact, it is a framework for module-izing your code (Effectively running a module control system inside a module control system). And now you want to host 10-20 of them? That is probably not going to be able to run on a single server. Docker will force Spring boot to load itself into memory per app; and objects can't be reused across modules now, so those need to be multi-instantiated too! And if you are restricted to 1 production server, horizontal scaling isn't an option. (You will need ~1GB of HEAP (RAM) per Spring Boot, mileage my very based on your code base). And with 10-20 apps, refactoring to make the app lighter for Docker deployment may not be feasible/in-budget. Not to mention, if you can't run a minimal setup locally for testing (insufficient RAM), development effort will get a lot more "fun".
Docker is not a golden hammer. Give it a try, evaluate the pros and cons yourself, and decide if the pros are worth the cons for you and your team(s).
Lagom's answer is great, but I'd like to add that the size of Docker containers should be as small as reasonably possible to ease transfer and storage.
Hence, there are a lot of containers based on the Alpine Linux distribution, which are really small. Try to use them if possible.
Furthermore, do not add every tool imaginable to your container, e.g. you can often do without wget...
Based on this, I would need 600 MB on my local system, yet need 7 GB
for all Docker images.
Is this approach correct? Should "OpenJDK" from DockerHub be added to
each image?
That is correct. While you could wonder if a JRE is not enough.
Why is the size of the image large even if the target PC may already
have the JDK?
You compare things that are not comparable : local environment(that is all but a production machine) VS integration/production environments.
In integration/production environment, the load of your applications may be high and the isolation between applications is generally advised. So here, you want to host a minimal number of application (ui/services) by machine (bare, VM or container) to prevent side effects between application : shared libraries incompatibility, software upgrade side effects, resource starving, chained failures between applications...
While in local environment, the load of your applications is quite low and the isolation between applications is generally not a serious issue. So here you can host multiple applications (ui/services) on your local machine and you can also share some common libraries/dependencies provided by the OS.
While you can do that, is really a good practice to mix and share everything in local ?
I don't think because :
1) the local machine is not a bin : you work on that the whole day. More that is clean more you development is efficient. For example : JDK/JRE may differ between applications hosted in local, some folders used by the application may have the same location, the database version may differ, applications can have different installed java server (Tomcat, Netty, Weblogic) and or with different versions...
Thanks to container, that is not an issue : all is installed and removed according to your requirements.
2) environments (from local to prod) should as close as possible to ease the whole integration-deployment chain and to detect issues early and not only in production.
As a side note, to achieve that in local you need a real machine for developer.
All has a cost but actually that is not expensive
Besides isolation (hardware and software resources), containers bring other advantages as fast deploy/undeploy, scalability and failover friendly (for example : Kubernetes relies on container).
Isolation, fastness, scalability and robustness friendly have a cost: to not share physically any resource between containers (OS, libraries, JVM, ...).
That means that even if you use the exact OS, libraries, JVM in your applications, each application will have to include them in their image.
Is it expensive ?
Not really : official images relies often on Alpine (light Linux OS with limitations but customizable if needed) and what represent a image of 350 MB (value that you quote is that is in the reality) in terms of cost ?
In fact, that is really cheap.
In integration/production, all your services will very probably not been hosted on the same machine, so compare the 350 MB for a container to resources used in traditional VMs for integration/production that contain a complete OS with multiple additional programs installed on. You understand that the resource consumption of containers is not issue. That is even considered as an advantage beyond local environments.

Eclipse - simulate full disk when running server

I have a small application which doesn't take up lots of space (the application does a lot of disk writing, so data accumulates) and I need to know what happens when the disk is full and the application is still trying to write to the disk; whether parts fail or the entire thing.
I'm currently running Tomcat within Eclipse and I would like to know if there is a way to limit the disk space allowed to a server created in Eclipse. Any Ideas?
You can create a small RAM drive, which can be used like a physical drive but exists entirely in RAM. This has the added benefit that is really fast and that you don't have to delete your test files afterwards, as the contents will be gone after your RAM drive is closed.
As for how exactly you create your RAM drive, this depends on your operating system.
In Linux, you can use tmpfs (taken from https://unix.stackexchange.com/questions/66329/creating-a-ram-disk-on-linux):
mount -o size=16G -t tmpfs none /mnt/tmpfs
Edit: In Windows, this isn't shipped with your system so you need to install additional software. A list of it can be found at http://en.wikipedia.org/wiki/List_of_RAM_drive_software.

Local java processes are grayed when trying to connect via JMX

I'm running a number of java processes on a windows XP professional machine. When i attempt to connect to these processes via a local JConsole the processes are grayed out.
However i can run the same processes on another machine and connect via a local JConsole on that machine.
Both machines are running java 1.6 version for the processes and jconsole.
Any ideas why these processes are grayed out?
I'm fighting with this issue right now and I found out a work around:
You can change the local user's temp dir to something that they can definitely access (e.g. D:\temp). Make sure to do this for the process you're trying to monitor and the jconsole process.
Another thing that can apparently cause issues are usernames with uppercase letters in them. The directory will always be created with all lowercase letters, but simply renaming it to exactly how it's being shown in the Task Manager made all the issues go away: http://planeofthought.com/wp/?p=75
if the processes are running as a different user (e.g. if you start them as services), then you won't be able to connect to them. also, if they are running under an older jvm, you most likely won't be able to talk to them either.
in some cases, the local jmx communication mechanism uses the local filesystem and may have issues if permissions are not defined correctly. are you possibly running any of these processes on networked filesystems (nfs, samba)?
Say your windows user name you use to start your java application seen in task manager is YOUR_USER_NAME.
Please check a folder whose name looks like hsperfdata_XXXXX (XXXXX should be your user name) in your temp folder and make sure YOUR_USER_NAME and XXXXX are exactly the same (be careful about the upper and lower case).
From http://download.oracle.com/javase/6/docs/technotes/guides/management/jconsole.html:
Applications that are not attachable, with the management agent disabled. These include applications started on a J2SE 1.4.2 platform or started on a J2SE 5.0 platform without the -Dcom.sun.management.jmxremote or com.sun.management.jmxremote.port options. These applications appear grayed-out in the table and JConsole cannot connect to them. In the example connection dialog shown in Figure 3-1, the Anagrams application was started with a J2SE 5.0 platform VM without any of the management properties to enable the JMX agent, and consequently shows up in gray and cannot be selected.
(source: oracle.com)
Despite what's being written in the documentation, most likely your process is running under a different user. You can run jconsole as an administrator and try then.
Here is what worked for me. I changed my %TEMP% and %TMP% environment variables to point to a folder I created in my %HOME% location (like C:\Users\[YOUR_NAME]\Temp). Once I did this, all problems vanished.
I had the problem as described earlier, but was advised a simpler solution: just close all programs using Java ("IntelliJ IDEA", "SoapUI", etc. - to unlock the temporary folder) and then delete %TMP%\hsperfdata_<user.name> folder. Then, after opening any Java program, this folder will be recreated but this time with correct name (most likely %TMP%\hsperfdata_<User.Name>). And after that, local Java processes can be monitored through "JConsole" or "VisualVM" (now runs without starting error with a link to VisualVM: Troubleshooting Guide) again.
instead of this steps you can just goto the CMD and then type in jconsole.exe (PID)
Remember to go to the path where jconsole is present and then run the executable file.
Change the name of the hsperfdata folder which for me was found at C:\Users\pmimgg0\AppData\Local\Temp\hsperfdata_pmimgg0 to match the User name found on task manager. Once I changed hsperfdata_pmimgg0 to hsperfdata_PMIMGG0 my local process was no longer greyed out on jconsole.
Change your TEMP paths in Environment Variables to something like D:\temp as it could be a permission issue. Fixed this issue for me
The best way is to run local process like a remote process.
Add these conditions in runtime arguments -
-Dcom.sun.management.jmxremote=true
-Dcom.sun.management.jmxremote.port=6001
-Dcom.sun.management.jmxremote.authenticate=false
-Dcom.sun.management.jmxremote.ssl=false
-Djava.rmi.server.hostname=localhost
-Dcom.sun.management.jmxremote.rmi.port=6001
Then select Remote Process and point to localhost:6001 as shown
Click Connect and Jconsole is connected successfully.
For me this fixed as I had some admin constraints.

Talk to VM through host operating system

I have here a Windows distribution server that runs an ANT task to build enterprise software. What I need to do is to have the ANT task copy and run a VM image (Linux), and then...talk to that Linux VM through the host operating system (through the ant task itself). We need to be able to send files and/or commands to it.
Is there a practical way to go about this? I know that we already have a way to send commands to VMs that are also running Windows (so windows-windows interaction) -- but is there a way to do a windows-linux interaction?
I've implemented the thing you wanted. Of course, for my own purposes, and then just found this question by googling on keywords "vmware" and "ant".
https://github.com/zhuravlik/ant-vix-tasks
This is the taskset for Ant to manage VMWare VMs.
It works via VIX API, so Linux guests should be supported by it.
I did not test it with VMWare Server, though. Only with Workstation.
But the API is common, so it should work.
Using ssh is probably the simplest. There is an ant task for that. Scp task is also there to copy files
It will depend on what you need to do, but:
The Linux system could expose an SSH server, and the host can do just about anything it needs to via SSH.
The Linux system could expose a web service that the host consumes.
The Linux system could expose a Samba share which the host then connects to and reads/writes from (if all you need to do is deal with some files, but that seems unlikely).
There are probably dozens of options.

Best way to deploy a Java application on a cluster of servers?

I have a cluster of 32 servers and I need a tool to distribute a Java service, packaged as a Jar file, to each machine and remotely start the service. The cluster consists of Linux (Suse 10) servers with 8 cores per blade. The application is a data grid which uses Oracle Coherence. What is the best tool for doing this?
I asked something similar once, and it seems that the Java Parallel Processing Framework might be what you need:
http://www.jppf.org/
From the web site:
JPPF is an open source Grid Computing
platform written in Java that makes it
easy to run applications in parallel,
and speed up their execution by orders
of magnitude. Write once, deploy once,
execute everywhere!
Have a look at OpenMOLE: http://www.openmole.org/
This tool enables you to distribute a computing workflow to several resources: from multicores machines, to clusters and computing grids.
It is nicely documented and can be controlled through groovy code or a GUI.
Distributing a jar on a cluster should be very easy to do with OpenMOLE.
Is your service packaged as an EJB? JBoss does a fairly good job with clustering.
Use Bit Torrent. Using Peer to Peer sharing style on clusters can really boost up your deployment speed.
It depends on which operating system you have and how security is setup on your network.
If you can use NFS or Windows Share, I suggest you put the software on an NFS drive which is visible to all machines. That way you can run them all from one copy.
If you have remote shell or secure remote shell you can write a script which runs the same command on each machine e.g. start on all machines, or stop on all machines.
If you have windows you might want to setup a service on each machine. If you have linux you might want to add a startup/shutdown script to each machine.
When you have a number of machines, it may be useful to have a tool which monitors that all your services are running, collects the logs and errors in one place and/or allows you to start/stop them from a GUI. There are a number of tools to do this, not sure which is the best these days.

Categories

Resources