I am currently trying to copy data from my postgres, which runs within a docker-container, to my windows host. For this purpose I implemented a java application (also within a docker container) that uses the postgres-jdbc driver and its CopyManager in order to copy specific data to the host in mapped volume.
Problem: When I copy the data to the mapped windows directory, it becomes very slow. (Writing 1 GB of data takes about 40 minutes - without volume mapping only 1 minute)
Docker-compose:
exportservice:
build: ./services/exportservice
volumes:
- samplePath:/export_data
I have already read that it's a known problem, but I haven't found a suitable solution.
My services have to run in a production environment that is based on Windows.
So what's the way to solve this issue? WSL2?
Looking forward to your advice!
Mounting a Windows folder into a Docker container is always slow no matter how you do it. WSL2 is even slower than WSL1 in that respect.
The best solution is to install WSL2, copy all your project files into the Linux file system (mounted in Windows at \\wsl$\<distro>\), run containers from there and mount Linux directories accordingly. That bypasses any Windows file interaction.
I wrote a Docker for Web Developers book and video course because I couldn't find good getting-started tutorials which explained how to create local development environments. It includes Hyper-V and WSL2 instructions and advice. Use the discount code dock30 for 30% off.
Use WSL2 instead of WSL and use the Linux file system. But you could also reduce your write cycles in order to reduce the writing overhead. This could be achieved by using a BufferedWriter in Java.
I'm looking for a solution how to read multiple files from FTP to Google CloudStorage in an efficient way. Each file size is 3-5 GB, the amount of files is 100-200.
I found the next solutions: read files using GAE instance.
Any ideas what else I can try?
The best way will be to use Google Cloud parallel uploads to Cloud Storage using gsutil compose. You can try this with:
gsutil -o GSUtil:parallel_composite_upload_threshold=150M cp bigfile gs://your-bucket
Basically:
gsutil is dividing the file in multiple smaller chuncks.
It then uploads all the files to Cloud Storage.
They get composed as a single file
Then it deletes all the smaller chuncks
Keep in mind this has a trade off described in the docs:
Using parallel composite uploads presents a tradeoff between upload
performance and download configuration: If you enable parallel
composite uploads your uploads will run faster, but someone will need
to install a compiled crcmod on every machine
where objects are downloaded by gsutil or other Python applications.
Note that for such uploads, crcmod is required for downloading
regardless of whether the parallel composite upload option is on or
not. For some distributions this is easy (e.g., it comes pre-installed
on macOS), but in other cases some users have found it difficult.
In case you are not able to use gsutil and you can't install the Cloud Storage SDK in your FTP server, you could download the files in a VM and run the Cloud Storage SDK or gsutil in this VM.
App Engine Standard does not allow to write to disk. Thus, any file you upload is going to be stored on memory until you upload them to Cloud Storage. So I don't think this is convenient in this case.
App Engine Flexible does allow writing to disk. This is an ephemeral disk, once it get's restarted the contents of the disk get's deleted and each week is restarted. But you wouldn't be taking advantaged of the Load Balancer and the Automatic Scaling the instance has.
In this case, I think the best way would be a Google Cloud preemptible VM. Now, even though this VM only lives for one day at max, they run at a lower price than a normal VM. Once they are going to get terminated, you could check which files had been uploaded to Storage and resume your workload in a new preemptible VM. You could also use a large number of this VMs working in parallel to speed up the download and upload process.
So, I'm very new to Docker. Let me explain the context to the question.
I have 10 - 20 Spring Boot micro-service applications, each running on different ports on my local machine.
But for migrating to Docker, based on my learning, each of the services must be in a different Docker container so as to quickly deploy or make copies.
For each Docker container, we need to create a new Docker image.
Each Docker image must contain a JRE for the Spring Boot application to run. It is around 200 MB maximum. That means each docker image is, say 350 MB at the maximum.
On the other hand, on my local PC I have only one JRE of 200 MB and each application takes only a few MB of space.
Based on this, I would need 600 MB on my local system, yet need 7 GB for all Docker images.
Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?
Why is the size of the image large even if the target PC may already have the JDK?
Your understanding is not correct.
Docker images are formed with layers; see next diagram:
When you install a JRE in your image, let's suppose its checksum is 91e54dfb1179 in the next picture, it will occupy your disk really.
But, if all your containers are then all based on the same image, and add different things, says, your different microservice application to the thin R/W layer, all containers will share the 91e54dfb1179, so it will not be the n*m relationship.
You need to pay attention to using the same base image for all Java applications as much as possible, and add different things to the thin R/W layer.
The other answers cover Docker layering pretty well, so I just want to add details for you questions
Is this approach correct? Should "OpenJDK" from DockerHub be added to each image?
Yes. If it's not in the image, it won't be in the container. You can save disk space though by reusing as many Layers as possible. So try to write your Dockerfile from "Least likely to change" to "Most likely to change". So when you build your image, the more often you see "Using cache", the better.
Why is the size of the image large even if the target PC may already have the JDK?
Docker wants as little to do with the host as possible. Docker doesn't even want to deal with the host. The first thing it does is create a VM to hide in. Docker images assume the only thing the host will give is empty ram, disk, and CPUs. So each Docker image must also contain it's own OS/kernel. (That is what your initial FROM is doing, picking a base OS image to use) So your final image size is actually OS + tools + app. Image size is a little misleading though, as it is the sum of all layers, which are reused across images.
(Implied) Should each app/micro-service be in its own container?
Ideally, yes. By converting your app into an isolated module, it makes it easier to replace/load-balance that module.
In practice, maybe not (for you). Spring Boot is not a light framework. In fact, it is a framework for module-izing your code (Effectively running a module control system inside a module control system). And now you want to host 10-20 of them? That is probably not going to be able to run on a single server. Docker will force Spring boot to load itself into memory per app; and objects can't be reused across modules now, so those need to be multi-instantiated too! And if you are restricted to 1 production server, horizontal scaling isn't an option. (You will need ~1GB of HEAP (RAM) per Spring Boot, mileage my very based on your code base). And with 10-20 apps, refactoring to make the app lighter for Docker deployment may not be feasible/in-budget. Not to mention, if you can't run a minimal setup locally for testing (insufficient RAM), development effort will get a lot more "fun".
Docker is not a golden hammer. Give it a try, evaluate the pros and cons yourself, and decide if the pros are worth the cons for you and your team(s).
Lagom's answer is great, but I'd like to add that the size of Docker containers should be as small as reasonably possible to ease transfer and storage.
Hence, there are a lot of containers based on the Alpine Linux distribution, which are really small. Try to use them if possible.
Furthermore, do not add every tool imaginable to your container, e.g. you can often do without wget...
Based on this, I would need 600 MB on my local system, yet need 7 GB
for all Docker images.
Is this approach correct? Should "OpenJDK" from DockerHub be added to
each image?
That is correct. While you could wonder if a JRE is not enough.
Why is the size of the image large even if the target PC may already
have the JDK?
You compare things that are not comparable : local environment(that is all but a production machine) VS integration/production environments.
In integration/production environment, the load of your applications may be high and the isolation between applications is generally advised. So here, you want to host a minimal number of application (ui/services) by machine (bare, VM or container) to prevent side effects between application : shared libraries incompatibility, software upgrade side effects, resource starving, chained failures between applications...
While in local environment, the load of your applications is quite low and the isolation between applications is generally not a serious issue. So here you can host multiple applications (ui/services) on your local machine and you can also share some common libraries/dependencies provided by the OS.
While you can do that, is really a good practice to mix and share everything in local ?
I don't think because :
1) the local machine is not a bin : you work on that the whole day. More that is clean more you development is efficient. For example : JDK/JRE may differ between applications hosted in local, some folders used by the application may have the same location, the database version may differ, applications can have different installed java server (Tomcat, Netty, Weblogic) and or with different versions...
Thanks to container, that is not an issue : all is installed and removed according to your requirements.
2) environments (from local to prod) should as close as possible to ease the whole integration-deployment chain and to detect issues early and not only in production.
As a side note, to achieve that in local you need a real machine for developer.
All has a cost but actually that is not expensive
Besides isolation (hardware and software resources), containers bring other advantages as fast deploy/undeploy, scalability and failover friendly (for example : Kubernetes relies on container).
Isolation, fastness, scalability and robustness friendly have a cost: to not share physically any resource between containers (OS, libraries, JVM, ...).
That means that even if you use the exact OS, libraries, JVM in your applications, each application will have to include them in their image.
Is it expensive ?
Not really : official images relies often on Alpine (light Linux OS with limitations but customizable if needed) and what represent a image of 350 MB (value that you quote is that is in the reality) in terms of cost ?
In fact, that is really cheap.
In integration/production, all your services will very probably not been hosted on the same machine, so compare the 350 MB for a container to resources used in traditional VMs for integration/production that contain a complete OS with multiple additional programs installed on. You understand that the resource consumption of containers is not issue. That is even considered as an advantage beyond local environments.
I have a Java server that I wrote myself running as a service. Right now looks like the application is somehow eating all my drive space at a 1GB per hour rate.
After a stop of the service the disk space becomes available by itself (I'm not deleting anything). From the application I'm not creating any files or writing to disk besides logs or the database but those are not growing so fast.
The big problem with this is that I can't find any file or folder that is eating up all my drive. I don't know if it is a system file that I don't have access to from the explorer or if it's a virus or a JVM bug. I'm using Oracle JVM 64 bit from JDK 7 update 7.
I appreciate a lot any help you can provide me with this. I have never seen something like that before.
Thanks.
Here are the possible pointers:
Check if your disk is full because of other applications (possibly malware)
Check if there are any IO operations from your application
Check if your local repository (like .m2, .gradle/caches) are filling it up during build with transitive dependencies
If possible, add couple of loggers to display the size of your hardisk using new File("/").getTotalSpace(); along with RAM details and watch how they are changing
Finally if nothing works out, try your application in another machine
we've bean struggling here at work by somebody suggestion that we should decrease the size of our war file, specifically the WEB-INF/lib directory size, in order to improve our production JBoss instance performance. Something I'm still suspicious about.
We have around 15 web apps deploy in our application server, each about 15 to 20 MB in size.
I know there are a lot of variables involved on this, but has anyone of you actually deal with this situation? Does the .war files size actually has a significant impact on web containers in general?
What advice can you offer?
Thank U.
There are many things to be suspicious of here:
What about the application is not performing to the level you would like?
Have you measured the application to find out which components are causing the lack of performance?
What are the bottlenecks in the application/system?
The size of the application alone has nothing to do with any sort of runtime performance. The number of classes loaded during the lifetime of the application has an impact on memory usage of the application, but an incredibly negligible one.
When dealing with "performance issues", the solution always follows the same general steps:
What does it mean when we say "bad performance"?
What specifically is not performing? Measure, measure, measure.
Can we improve the specific component not performing to the level we want?
If so, implement the ideas, measure again to find out if performance has truly improved.
Need you to tell us the operating system.
Do you have antivirus live protection?
A war/jar file is actually a zip file - i.e., if you renamed a .war to a .zip, you can use a zip utility to view/unzip it.
During deployment, the war file is unzipped once into a designated folder. If you have live-protection, the antivirus utility might take some time to scan the new branch of directories created and slow down any access to them.
Many web app frameworks, like JSPs, create temporary files and your live-protection would get into action to scan them.
If this is your situation, you have to decide whether you wish to exclude your web-app from antivirus live-scanning.
Are you running Linux but your web directory is accessed using ntfs-3g? If so, check if the ntfs directory is compressed. ntfs-3g has problems accessing compressed ntfs files especially when multiple files are manipulated/created/uncompressed simultaneously. In the first place, unless there are some extremely valid reasons (which I can't see any), a web app directory should be a local partition in a format native to Linux.
Use wireshark to monitor the network activity. Find out if web apps are causing accesses to remote file systems. See if there are too many retransmits whenever the web apps are active. Excessive retransmits or requests for retransmits means the network pipeline has integrity problems. I am still trying to understand this issue myself - some network cards have buffering problems (as though buffer overflow) operating in Linux but not in Windows.
Wireshark is not difficult to use as long as you have an understanding of ip addresses, and you might wish to write awk, perl or python scripts to analyze the traffic. Personally, I would use SAS.