AWS EMR - There is insufficient memory for the Java Runtime

AWS EMR - There is insufficient memory for the Java Runtime - java

I am running a MapReduce job on AWS EMR. The map job completes except for one file that is very large. I get the following error:
OpenJDK 64-Bit Server VM warning: INFO: os::commit_memory(0x0000000611280000, 1521483776, 0) failed; error='Cannot allocate memory' (errno=12)
#
# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (malloc) failed to allocate 1521483776 bytes for committing reserved memory.
It seems to be a memory issue. I've modified my configuration json file to have added (a lot more than required) memory:
[
{
"Classification": "hadoop-env",
"Properties": {},
"Configurations": [
{
"Classification": "export",
"Properties": {
"HADOOP_DATANODE_HEAPSIZE": "10240",
"HADOOP_NAMENODE_OPTS": "-XX:GCTimeRatio=19",
"HADOOP_HEAPSIZE": "11264",
"HADOOP_CLIENT_OPTS": "-Xmx10240M"
}
}
]
},
{
"Classification": "mapred-site",
"Properties": {
"mapreduce.map.memory.mb": "24576",
"mapreduce.map.java.opts": "-Xmx19200M",
"mapred.child.java.opts": "-Xmx4096M",
"mapreduce.reduce.memory.mb": "15360",
"mapreduce.reduce.java.opts": "-Xmx10240M",
"mapreduce.job.jvm.numtasks": "1",
"mapreduce.job.reuse.jvm.num.tasks": "1"
}
},
{
"Classification": "yarn-site",
"Properties": {
"yarn.scheduler.maximum-allocation-mb": "25600",
"yarn.nodemanager.resource.memory-mb": "25600"
}
},
{
"Classification": "hive-env",
"Properties": {}
},
{
"Classification": "hive-site",
"Properties": {}
}
]
However, I keep on getting the issue. As you can see, I have added mapred.child.java.opts as many suggest online, but I've had no luck. What else can I try?
Much appreciated.

It appears your configuration is exceeding the physical memory bounds of the server. The m3.xl only physically has 15G and by default the safe amount of memory to allocate to a container is 11.5G (http://docs.aws.amazon.com/ElasticMapReduce/latest/ReleaseGuide/emr-hadoop-task-config.html).
So for the m3.xl the largest you can set mapreduce.map.java.opts is -Xmx9216 with mapreduce.map.memory.mb at 11520 (the opts should always be less than total map memory, usually around 80%). These are the properties that impact the map task memory size. If the map task needs more memory in order to process the larger files, then a larger instance type will need to be used.
I recommend not making other memory property changes unless those processes are needing such tuning specifically.

Related

Unable to push docker image to Nexus

I am running Nexus OSS version 3.29.2-02 and I am experiencing some weird behavior. I am building various images at a CI level (GitLab) and I am pushing them to a custom repository.
For the most part everything works OK and I have no issues tagging and pushing my produced images. Lately though, one of the projects that produces Docker images fails to push, with the following error:
$ TAGGED=${NEXUS_DOCKER_URL}/${BASE_IMAGE_NAME}:snapshot-MR${CI_MERGE_REQUEST_IID}
$ docker tag ${BASE_IMAGE_NAME}:latest ${TAGGED}
$ docker push ${TAGGED}
The push refers to repository [<custom-repository-url>/<image-name:tag>]
cade37b0f9c9: Preparing
578ec024f17c: Preparing
fe0b994190e8: Preparing
b24d08ca4359: Preparing
9a14db3b513b: Preparing
777b2c648970: Preparing
777b2c648970: Waiting
b24d08ca4359: Layer already exists
9a14db3b513b: Layer already exists
777b2c648970: Layer already exists
fe0b994190e8: Pushed
cade37b0f9c9: Pushed
578ec024f17c: Pushed
[DEPRECATION NOTICE] registry v2 schema1 support will be removed in an upcoming release. Please contact admins of the <custom-repository-url> registry NOW to avoid future disruption.
errors:
blob unknown: blob unknown to registry
blob unknown: blob unknown to registry
ERROR: Job failed: exit code 1
I have tried debugging this behavior as well as search online for a solution but I have yet to find anything. It seems that for some reason, this specific Docker image cannot be uploaded. I have tried the same procedure from both a local machine as well as from stateless CI builders and the behavior is consistent i.e. I was able to push it only once and then the process kept failing.
For reference my Dockerfile is the following:
FROM <custom-repository-url>/adoptopenjdk/openjdk11:jre-11.0.10_9-alpine
WORKDIR /home/app
COPY build/libs/email-service.jar application.jar
# Set the appropriate timezone
RUN apk add --no-cache tzdata && \
cp /usr/share/zoneinfo/America/New_York /etc/localtime && \
echo "America/New_York" > /etc/timezone
EXPOSE 8080
CMD java -jar ${OPTS} application.jar
Which is quite straightforward and does not hide anything complicated. I initially thought that the problem could have been attributed to using a proxied based image (i.e FROM) but this is done of several other projects without any issues.
I have tried also checking Nexus's logs and the only thing I see is the following:
2021-02-05 17:12:27,441+0000 ERROR [qtp1025847496-15765] ci-deploy org.sonatype.nexus.repository.docker.internal.orient.V2ManifestUtilImpl - Manifest refers to missing layer: sha256:66db482b5034f8eda0b18533d4eddb0012f4940bf3d348b08ac3bac8486bb2ee for: fts/marketing/email-service/snapshot-MR40 in repository RepositoryImpl$$EnhancerByGuice$$4d5af99c{type=hosted, format=docker, name='docker-hosted-s3'}
2021-02-05 17:12:27,443+0000 ERROR [qtp1025847496-15765] ci-deploy org.sonatype.nexus.repository.docker.internal.orient.V2ManifestUtilImpl - Manifest refers to missing layer: sha256:2ec25ba939258edb2e85293896c5126478d79fe416d3b60feb20426755bcea5a for: fts/marketing/email-service/snapshot-MR40 in repository RepositoryImpl$$EnhancerByGuice$$4d5af99c{type=hosted, format=docker, name='docker-hosted-s3'}
2021-02-05 17:12:27,445+0000 WARN [qtp1025847496-15765] ci-deploy org.sonatype.nexus.repository.docker.internal.V2Handlers - Error: PUT /v2/fts/marketing/email-service/manifests/snapshot-MR40: 400 - org.sonatype.nexus.repository.docker.internal.V2Exception: Invalid Manifest
So my question are:
What does this error really mean? I don't find it very useful:
errors:
blob unknown: blob unknown to registry
blob unknown: blob unknown to registry
What is really causing this behavior and how can I address the problem?
Note (not that it should make any difference), the image is a dockerized Micronaut application, using the latest version of the framework.
For reference, the output of docker inspect for said image is the following:
[{
"Id": "sha256:fec226a68e3b744fc792e47d3235e67f06b17883e60df52c8ae82c5a7ba9750f",
"RepoTags": [
"<custom-repository-url>/fts/marketing/email-service:mes-33-3",
"test-mes-33:latest"
],
"RepoDigests": [],
"Parent": "sha256:ddd8e2235b60d7636283097fc61e5971c32b3006ee52105e2a77e7d4ee7e709e",
"Comment": "",
"Created": "2021-02-06T21:06:59.987108458Z",
"Container": "8ab70692b75aac21d0866816aa52af5febf620744282d71a39dce55f81fe3e44",
"ContainerConfig": {
"Hostname": "8ab70692b75a",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"8080/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/opt/java/openjdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"LANG=en_US.UTF-8",
"LANGUAGE=en_US:en",
"LC_ALL=en_US.UTF-8",
"JAVA_VERSION=jdk-11.0.10+9",
"JAVA_HOME=/opt/java/openjdk"
],
"Cmd": [
"/bin/sh",
"-c",
"#(nop) ",
"CMD [\"/bin/sh\" \"-c\" \"java -jar ${OPTS} application.jar\"]"
],
"Image": "sha256:ddd8e2235b60d7636283097fc61e5971c32b3006ee52105e2a77e7d4ee7e709e",
"Volumes": null,
"WorkingDir": "/home/app",
"Entrypoint": null,
"OnBuild": null,
"Labels": {}
},
"DockerVersion": "19.03.13",
"Author": "",
"Config": {
"Hostname": "",
"Domainname": "",
"User": "",
"AttachStdin": false,
"AttachStdout": false,
"AttachStderr": false,
"ExposedPorts": {
"8080/tcp": {}
},
"Tty": false,
"OpenStdin": false,
"StdinOnce": false,
"Env": [
"PATH=/opt/java/openjdk/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
"LANG=en_US.UTF-8",
"LANGUAGE=en_US:en",
"LC_ALL=en_US.UTF-8",
"JAVA_VERSION=jdk-11.0.10+9",
"JAVA_HOME=/opt/java/openjdk"
],
"Cmd": [
"/bin/sh",
"-c",
"java -jar ${OPTS} application.jar"
],
"Image": "sha256:ddd8e2235b60d7636283097fc61e5971c32b3006ee52105e2a77e7d4ee7e709e",
"Volumes": null,
"WorkingDir": "/home/app",
"Entrypoint": null,
"OnBuild": null,
"Labels": null
},
"Architecture": "amd64",
"Os": "linux",
"Size": 220998577,
"VirtualSize": 220998577,
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/78561c2e477b099a547bead4ea17b677bb01376fc1ed1ce1cd942157d35c0329/diff:/var/lib/docker/overlay2/af8ac4feace0cbecd616e2a02850ec366715aaa5ac8ad143cb633f52b0f6fbe2/diff:/var/lib/docker/overlay2/211a8e68c833f664de5d304838b8cd98b8e5e790f79da8b8839a4d52d02a8d66/diff:/var/lib/docker/overlay2/cbc98e7274ff8266425aed31989066ff7c5f7a46d9334b84110fc57d8b1d942c/diff:/var/lib/docker/overlay2/c773dedbc53b81c2e68ad61811445c0377271db3af526dbf5a6aa6671d0b2b71/diff",
"MergedDir": "/var/lib/docker/overlay2/04240d9f745382480e52e04d8088de6f65a9ece0cd6e091953087f3d06fcc93c/merged",
"UpperDir": "/var/lib/docker/overlay2/04240d9f745382480e52e04d8088de6f65a9ece0cd6e091953087f3d06fcc93c/diff",
"WorkDir": "/var/lib/docker/overlay2/04240d9f745382480e52e04d8088de6f65a9ece0cd6e091953087f3d06fcc93c/work"
},
"Name": "overlay2"
},
"RootFS": {
"Type": "layers",
"Layers": [
"sha256:777b2c648970480f50f5b4d0af8f9a8ea798eea43dbcf40ce4a8c7118736bdcf",
"sha256:9a14db3b513b928759670c6a9b15fd89a8ad9bf222c75e0998c21bcb04e25e48",
"sha256:b24d08ca43598c9ea44f73c3f5dfca2b4897c475b2cc480bac98cccc42dce10f",
"sha256:11d1fa1ad1ef523c60369c11b1096baf89c8d43afa53813e84c73d0926848598",
"sha256:30001f69fd3b3b08fdbf6d843e38d0a16d0e46e84923f92480ac88603c0eb680",
"sha256:b2d3c5f57d1d626a7501b8871f548fd7e1f7625fe05c1885c27ec415b14e9915"
]
},
"Metadata": {
"LastTagTime": "2021-02-06T23:08:30.440032169+02:00"
}
}]

A docker registry (in your case Nexus) throws that error whenever it encounters a missing/invalid layer in the image.
Nexus used to have difficulty with foreign layers but that shouldn't be a problem since you are running quite a recent version.
I would think that you only need to enable "foreign layer caching" in Nexus to get this working.
It'd be helpful to include the output of docker manifest inspect <custom-repository-url>/adoptopenjdk/openjdk11:jre-11.0.10_9-alpine and docker manifest inspect ${TAGGED}
Docker registry API spec

Slow application, frequent JVM hangs with single-CPU setups and Java 12+

We have a client application (with 10+ years of development). Its JDK was upgraded from OpenJDK 11 to OpenJDK 14 recently. On single-CPU (hyper-threading disabled) Windows 10 setups (and inside VirtualBox machines with only one available CPU) the application starts quite slowly compared to Java 11. Furthermore, it uses 100% CPU most of the time. We could also reproduce the issue with setting the processor affinity to only one CPU (c:\windows\system32\cmd.exe /C start /affinity 1 ...).
Some measurement with starting the application and doing a query with minimal manual interaction in my VirtualBox machine:
OpenJDK 11.0.2: 36 seconds
OpenJDK 13.0.2: ~1.5 minutes
OpenJDK 13.0.2 with -XX:-UseBiasedLocking: 46 seconds
OpenJDK 13.0.2 with -XX:-ThreadLocalHandshakes: 40 seconds
OpenJDK 14: 5-6 minutes
OpenJDK 14 with -XX:-UseBiasedLocking: 3-3,5 minutes
OpenJDK 15 EA Build 20: ~4,5 minutes
Only the used JDK (and the mentioned options) has been changed. (-XX:-ThreadLocalHandshakes is not available in Java 14.)
We have tried logging what JDK 14 does with -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50.
Counting the log lines for every second seems quite smooth with OpenJDK 11.0.2:
$ cat jdk11-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n
30710 0
44012 1
55461 2
55974 3
27182 4
41292 5
43796 6
51889 7
54170 8
58850 9
51422 10
44378 11
41405 12
53589 13
41696 14
29526 15
2350 16
50228 17
62623 18
42684 19
45045 20
On the other hand, OpenJDK 14 seems to have interesting quiet periods:
$ cat jdk14-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n
7726 0
1715 5
10744 6
4341 11
42792 12
45979 13
38783 14
17253 21
34747 22
1025 28
2079 33
2398 39
3016 44
So, what's happening between seconds 1-4, 7-10 and 14-20?
...
[0.350s][7248][debug][class,resolve ] jdk.internal.ref.CleanerFactory$1 java.lang.Thread CleanerFactory.java:45
[0.350s][7248][debug][class,resolve ] jdk.internal.ref.CleanerImpl java.lang.Thread CleanerImpl.java:117
[0.350s][7248][info ][biasedlocking ] Aligned thread 0x000000001727e010 to 0x000000001727e800
[0.350s][7248][info ][os,thread ] Thread started (tid: 2944, attributes: stacksize: default, flags: CREATE_SUSPENDED STACK_SIZE_PARAM_IS)
[0.350s][6884][info ][os,thread ] Thread is alive (tid: 6884).
[0.350s][6884][debug][os,thread ] Thread 6884 stack dimensions: 0x00000000175b0000-0x00000000176b0000 (1024k).
[0.350s][6884][debug][os,thread ] Thread 6884 stack guard pages activated: 0x00000000175b0000-0x00000000175b4000.
[0.350s][7248][debug][thread,smr ] tid=7248: Threads::add: new ThreadsList=0x0000000017254500
[0.350s][7248][debug][thread,smr ] tid=7248: ThreadsSMRSupport::free_list: threads=0x0000000017253d50 is freed.
[0.350s][2944][info ][os,thread ] Thread is alive (tid: 2944).
[0.350s][2944][debug][os,thread ] Thread 2944 stack dimensions: 0x00000000177b0000-0x00000000178b0000 (1024k).
[0.350s][2944][debug][os,thread ] Thread 2944 stack guard pages activated: 0x00000000177b0000-0x00000000177b4000.
[0.351s][2944][debug][class,resolve ] java.lang.Thread java.lang.Runnable Thread.java:832
[0.351s][2944][debug][class,resolve ] jdk.internal.ref.CleanerImpl jdk.internal.misc.InnocuousThread CleanerImpl.java:135
[0.351s][2944][debug][class,resolve ] jdk.internal.ref.CleanerImpl jdk.internal.ref.PhantomCleanable CleanerImpl.java:138
[0.351s][2944][info ][biasedlocking,handshake] JavaThread 0x000000001727e800 handshaking JavaThread 0x000000000286d800 to revoke object 0x00000000c0087f78
[0.351s][2944][debug][vmthread ] Adding VM operation: HandshakeOneThread
[0.351s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeOneThread
[0.351s][6708][debug][vmoperation ] begin VM_Operation (0x00000000178af250): HandshakeOneThread, mode: no safepoint, requested by thread 0x000000001727e800
# no log until 5.723s
[5.723s][7248][info ][biasedlocking ] Revoked bias of currently-unlocked object
[5.723s][7248][debug][handshake,task ] Operation: RevokeOneBias for thread 0x000000000286d800, is_vm_thread: false, completed in 94800 ns
[5.723s][7248][debug][class,resolve ] java.util.zip.ZipFile$CleanableResource java.lang.ref.Cleaner ZipFile.java:715
[5.723s][7248][debug][class,resolve ] java.lang.ref.Cleaner jdk.internal.ref.CleanerImpl$PhantomCleanableRef Cleaner.java:220
[5.723s][7248][debug][class,resolve ] java.util.zip.ZipFile$CleanableResource java.util.WeakHashMap ZipFile.java:716
...
The second pause a little bit later:
...
[6.246s][7248][info ][class,load ] java.awt.Graphics source: jrt:/java.desktop
[6.246s][7248][debug][class,load ] klass: 0x0000000100081a00 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 5625 checksum: 0025818f
[6.246s][7248][debug][class,resolve ] java.awt.Graphics java.lang.Object (super)
[6.246s][7248][info ][class,loader,constraints] updating constraint for name java/awt/Graphics, loader 'bootstrap', by setting class object
[6.246s][7248][debug][jit,compilation ] 19 4 java.lang.Object::<init> (1 bytes) made not entrant
[6.246s][7248][debug][vmthread ] Adding VM operation: HandshakeAllThreads
[6.246s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeAllThreads
[6.246s][6708][debug][vmoperation ] begin VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1400 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 700 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 800 ns
# no log until 11.783s
[11.783s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 6300 ns
[11.783s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns
[11.783s][6708][debug][vmoperation ] end VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[11.783s][7248][debug][protectiondomain ] Checking package access
[11.783s][7248][debug][protectiondomain ] class loader: a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000000c0058628} protection domain: a 'java/security/ProtectionDomain'{0x00000000c058b948} loading: 'java/awt/Graphics'
[11.783s][7248][debug][protectiondomain ] granted
[11.783s][7248][debug][class,resolve ] sun.launcher.LauncherHelper java.awt.Graphics LauncherHelper.java:816 (reflection)
[11.783s][7248][debug][class,resolve ] jdk.internal.reflect.Reflection [Ljava.lang.reflect.Method; Reflection.java:300
[11.783s][7248][debug][class,preorder ] java.lang.PublicMethods$MethodList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
...
Then the third one:
...
[14.578s][7248][debug][class,preorder ] java.lang.InheritableThreadLocal source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
[14.578s][7248][info ][class,load ] java.lang.InheritableThreadLocal source: jrt:/java.base
[14.578s][7248][debug][class,load ] klass: 0x0000000100124740 super: 0x0000000100021a18 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 1338 checksum: 8013ed55
[14.578s][7248][debug][class,resolve ] java.lang.InheritableThreadLocal java.lang.ThreadLocal (super)
[14.578s][7248][debug][jit,compilation ] 699 3 java.lang.ThreadLocal::get (38 bytes) made not entrant
[14.578s][7248][debug][vmthread ] Adding VM operation: HandshakeAllThreads
[14.578s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeAllThreads
[14.578s][6708][debug][vmoperation ] begin VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1600 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 900 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 0 ns
[14.579s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 900 ns
# no log until 21.455s
[21.455s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 12100 ns
[21.455s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns
[21.455s][6708][debug][vmoperation ] end VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[21.455s][7248][debug][class,resolve ] sun.security.jca.Providers java.lang.InheritableThreadLocal Providers.java:39
[21.455s][7248][info ][class,init ] 1251 Initializing 'java/lang/InheritableThreadLocal'(no method) (0x0000000100124740)
[21.455s][7248][debug][class,resolve ] java.lang.InheritableThreadLocal java.lang.ThreadLocal InheritableThreadLocal.java:57
[21.456s][7248][debug][class,preorder ] sun.security.jca.ProviderList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
[21.456s][7248][info ][class,load ] sun.security.jca.ProviderList source: jrt:/java.base
[21.456s][7248][debug][class,load ] klass: 0x00000001001249a8 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 11522 checksum: bdc239d2
[21.456s][7248][debug][class,resolve ] sun.security.jca.ProviderList java.lang.Object (super)
...
The following two lines seems interesting:
[11.783s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns
[21.455s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns
Is that normal that these handshakes took 5.5 and 6.8 seconds?
I have experienced the same slowdown (and similar logs) with the update4j demo app (which is completely unrelated to our application) running with this command:
Z:\swing>\jdk-14\bin\java -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50 \
-jar update4j-1.4.5.jar --remote http://docs.update4j.org/demo/setup.xml
What should I look for to make our app faster again on single-CPU Windows 10 setups? Can I fix this by changing something in our application or by adding JVM arguments?
Is that a JDK bug, should I report it?
update 2020-04-25:
As far as I see the log files also contains GC logs. These are the first GC logs:
$ cat app.txt.00 | grep "\[gc"
[0.016s][7248][debug][gc,heap ] Minimum heap 8388608 Initial heap 60817408 Maximum heap 1073741824
[0.017s][7248][info ][gc,heap,coops ] Heap address: 0x00000000c0000000, size: 1024 MB, Compressed Oops mode: 32-bit
[0.018s][7248][info ][gc ] Using Serial
[22.863s][6708][info ][gc,start ] GC(0) Pause Young (Allocation Failure)
[22.863s][6708][debug][gc,heap ] GC(0) Heap before GC invocations=0 (full 0): def new generation total 17856K, used 15936K [0x00000000c0000000, 0x00000000c1350000, 0x00000000d5550000)
...
Unfortunately it does not seem related since it starts after the third pause.
update 2020-04-26:
With OpenJDK 14 the application uses 100% CPU in my (single-CPU) VirtualBox machine (running on a i7-6600U CPU). The virtual machine has 3,5 GB RAM. According to Task Manager 40%+ is free and disk activity is 0% (I guess this means no swapping). Adding another CPU to the virtual machine (and enabling hyper-threading for physical machines) make the application fast enough again. I just wondering, was it an intentional trade-off during JDK development to loss performance on (rare) single-CPU machines to make the JVM faster on multicore/hyper-threading CPUs?

TL;DR: It's an OpenJDK regression filed as JDK-8244340 and has been fixed in JDK 15 Build 24 (2020/5/20).
I did not except that but I could reproduce the issue with a simple hello world:
public class Main {
public static void main(String[] args) {
System.out.println("Hello world");
}
}
I have used these two batch files:
main-1cpu.bat, which limits the java process to only one CPU:
c:\windows\system32\cmd.exe /C start /affinity 1 \
\jdk-14\bin\java \
-Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50 \
Main
main-full.bat, the java process can use both CPUs:
c:\windows\system32\cmd.exe /C start /affinity FF \
\jdk-14\bin\java \
-Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50 \
Main
(The differences are the affinity value and name of the log file. I've wrapped it for easier reading but wrapping with \ probably doesn't work on Windows.)
A few measurements on Windows 10 x64 in VirtualBox (with two CPUs):
PS Z:\main> Measure-Command { .\main-1cpu.bat }
...
TotalSeconds : 7.0203455
...
PS Z:\main> Measure-Command { .\main-full.bat }
...
TotalSeconds : 1.5751352
...
PS Z:\main> Measure-Command { .\main-full.bat }
...
TotalSeconds : 1.5585384
...
PS Z:\main> Measure-Command { .\main-1cpu.bat }
...
TotalSeconds : 23.6482685
...
The produced tracelogs contain similar pauses that you can see in the question.
Running Main without tracelogs is faster but the difference still can be seen between the single-CPU and two-CPU version: ~4-7 seconds vs. ~400 ms.
I've sent this findings to the hotspot-dev#openjdk mail list and devs there confirmed that this is something that the JDK could handle better. You can find supposed fixes in the thread too.
Another thread about the regression on hotspot-runtime-dev#. JIRA issue for the fix: JDK-8244340

From my experience performance problems with JDKs are related mostly to one of the following:
JIT Compilation
VM configuration (heap sizes)
GC algorithm
Changes in the JVM/JDK which break known good running applications
(Oh, and I forgot to mention class loading...)
If you just use the default JVM configuration since OpenJDK11, maybe you should set some of the more prominent options to fixed values, like GC, Heap size, etc.
Maybe some graphical analyse tool could help track your issue down. Like Retrace, AppDynamics or FlightRecorder and the like. These give more overview on the overall state of heap, GC cycles, RAM, threads, CPU load and so on at a given time than log files could provide.
Do I understand correctly that your application writes about 30710 lines to the log within the first second of running (under OpenJDK11)? Why is it "only" writing about 7k lines under OpenJDK14 in the first second? This seems like a huge difference for an application that is just started on different JVMs to me... Are you sure there are not for example high amounts of Exception stacktraces dumped into the log?
The other numbers are even higher sometimes, so maybe the slowdowns are related to exception logging? Or even swapping, if RAM gets low?
Actually I am thinking, if an application does not write anything into the log, this is a sign of smooth running without problems (unless it is frozen entirely in this time). What is happening from seconds 12-22 (in the OpenJDK14 case here) is what would concern me more... the logged lines go through the roof... why?
And afterwards the logging goes down to all time low values of about 1-2k lines... what is the reason for that?? (Well, maybe it is the GC kicking in at second 22 and does a tabula rasa which resolves some things...?)
Another thing may be your statement about "single CPU" machines. Does this imply "single core" also (Idk, maybe your software is tailored on legacy hardware or something)?
And the "single CPU" VMs are running on those machines?
But I assume, I am wrong about these assumptions, since almost all CPUs are multicore nowadays... but I would investigate on a multithreading issue (deadlock) problem maybe.

Since it's using 100% CPU "most of the time", and it takes 10 times longer (!) with Java 14, it means that you're wasting 90% of your CPU in Java 14.
Running out of heap space can do that, as you spend a whole lot of time in GC, but you seem to have ruled that out.
I notice that you're tweaking the biased locking option, and that it makes a significant difference. That tells me that maybe your program does a lot of concurrent work in multiple threads. It's possible that your program has a concurrency bug that shows up in Java 14, but not in Java 10. That could also explain why adding another CPU makes it more than twice as fast.
Concurrency bugs often only show up when you're unlucky, and the trigger could really have been anything, like a change to hashmap organization, etc.
First, if it's feasible, check for any loops that might be busy-waiting instead of sleeping.
Then, run a profiler in sampling mode (jvisualvm will do) and look for methods that are taking a much larger % of total time than they should. Since your performance is off by a factor of 10, any problems in there should really jump out.

This is an interesting issue and it would require indeterminate amount of effort to narrow it down since there are many permutations and combinations that need to be tried out and data collected and collated.
Seems as of there has been no resolution to this for some time. Perhaps this might need to be escalated.
EDIT 2: Since "ThreadLocalHandshakes" is deprecated and we can assume that locking is contended, suggest trying without "UseBiasedLocking" to hopefully speed up this scenario.
However there are some suggestions to collect more data and attempt to isolate the issue.
Allocate more than one core [I see that you have tried it and the issue goes away. Seems to be an issue with a thread/s execution precluding others. See no 7 below)
Allocate more heap (perhaps the demands of v14 is more than that of earlier jdks)
Allocate more memory to the Win 10 VB.
Check the OS system messages (Win 10 in your case)
Run it in an non-virtualized Win 10.
Try a different build of jdk 14
Do a thread dump every (or profile)few intervals of time. Analyze what thread is running exclusively. Perhaps there is a setting for equitable time sharing. Perhaps there is a higher priority thread running. What is that thread and what is it doing? In linux you could stat the lightweight processes (threads) associated with a process and its state in realtime. Something similar on Win 10?
CPU usage? 100% or less? Constrained by CPU or mem? 100% CPU in service threads? Which service thread?
Have you explicitly set a GC algo?
I have personally witnessed issues within versions that have to do with GC, heap resizing, issues with virtualized containers and so on.
There is no easy answer to that, I think, especially since this question has been around for some time. But we can try, all the best and let us know what is the result of some of these isolation steps.
EDIT 1: from the updated question, it seems to be related to a GC or another service thread taking over the single core non-equitably (Thread-Local Handshakes)?

Be careful with logging to slow disks, it will slow down your application:
https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
But it doesn't seem likely to be the cause of the issue as the CPU is still busy and you don't have to wait for all threads to come to a safe point thanks to thread-local handshake: https://openjdk.java.net/jeps/312
Also not directly related to the problem you have but more generally if you want to try to squeeze more performance out of your hardware for startup time, take a look at AppCDS (class data sharing):
https://blog.codefx.org/java/application-class-data-sharing/

Memory issue with App Engine and Firestore

I'm developing a MS with Kotlin and Micronaut which access a Firestore database. When I run this MS locally I can make it work with 128M because it's very simple just read and write data to Firestore, and not big amounts of data, really small data like this:
{
"project": "DUMMY",
"columns": [
{
"name": "TODO",
"taskStatus": "TODO"
},
{
"name": "IN_PROGRESS",
"taskStatus": "IN_PROGRESS"
},
{
"name": "DONE",
"taskStatus": "DONE"
}
],
"tasks": {}
}
I'm running this in App Engine Standard in a F1 instance (256 MB 600 MHz) with this properties in my app.yaml
runtime: java11
instance_class: F1 # 256 MB 600 MHz
entrypoint: java -Xmx200m -jar MY_JAR.jar
service: data-connector
env_variables:
JAVA_TOOL_OPTIONS: "-Xmx230m"
GAE_MEMORY_MB: 128M
automatic_scaling:
max_instances: 1
max_idle_instances: 1
I know all that properties for handling memory are not necessary but I was desperate trying to make this work and just tried a lot of solutions because my first error message was:
Exceeded soft memory limit of 256 MB with 263 MB after servicing 1 requests total. Consider setting a larger instance class in app.yaml.
The error below is not fixed with the properties in the app.yaml, but now everytime I make a call to return that JSON I get this error
2020-04-10 12:09:15.953 CEST
While handling this request, the process that handled this request was found to be using too much memory and was terminated. This is likely to cause a new process to be used for the next request to your application. If you see this message frequently, you may have a memory leak in your application or may be using an instance with insufficient memory. Consider setting a larger instance class in app.yaml.
It always last longer in the first request, I think due to some Firestore configuration, but the thing is that I cannot make that work, always getting the same error.
Do you have any idea what I could be doing wrong or what I need to fix this?

TL;DR The problem was I tried to used a very small instance for a simple application, but even with that I needed more memory.
Ok, a friend helped me with this. I was using a very small instance and even when I didn't get the error of memory limit it was a memory problem.
Updating my instance to a F2 (512 MB 1.2 GHz) solved the problem and testing my app with siege resulted in a very nice performance:
Transactions: 5012 hits
Availability: 100.00 %
Elapsed time: 59.47 secs
Data transferred: 0.45 MB
Response time: 0.30 secs
Transaction rate: 84.28 trans/sec
Throughput: 0.01 MB/sec
Concurrency: 24.95
Successful transactions: 3946
Failed transactions: 0
Longest transaction: 1.08
Shortest transaction: 0.09
My sysops friends tells me that this instances are more for python scripting code and things like that, not JVM REST servers.

Docker runs out of disk space even though containers are small

I have installed Docker Toolbox for Mac OSX and running several containers inside. First two I created were with Cassandra and were running fine. After that I've created 2 Debian containers, connected to bash through docker terminal with the purpose to install Oracle JDK8.
At the point when I was about to extract java from the tarball - I've got a ton of "Cannot write: No space left on device" error messages during the execution of "tar" command.
I've checked the space:
$ docker ps -s
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES SIZE
9d8029e21918 debian:latest "/bin/bash" 54 minutes ago Up 54 minutes deb-2 620.5 MB (virtual 744 MB)
49c7a0e37475 debian:latest "/bin/bash" 55 minutes ago Up 55 minutes deb-1 620 MB (virtual 743.5 MB)
66a17af83ca3 cassandra "/docker-entrypoint.s" 4 hours ago Up 4 hours 7000-7001/tcp, 7199/tcp, 9042/tcp, 9160/tcp node-2 40.16 MB (virtual 412.6 MB)
After seeing that output I noticed that one of my nodes with cassandra is missing. In went to check to Kitematic and found out that it is in the DOWN state and I can't start it: "Cannot write node . No space left on device" - error message shown for this attempt.
Are there any limits that Docker has to run the containers?
When I remove all my cassandra ones and leave just a couple of Debian - java is able to be extracted from the tar. So the issue is definitely in some Docker settings related to sizing.
What is the correct way to resolve the issue with space limits here?
UPDATE.
$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
cassandra latest 13ea610e5c2b 11 hours ago 374.8 MB
debian jessie 23cb15b0fcec 2 weeks ago 125.1 MB
debian latest 23cb15b0fcec 2 weeks ago 125.1 MB
The output of df -hi
$ df -hi
Filesystem Inodes IUsed IFree IUse% Mounted on
none 251K 38K 214K 15% /
tmpfs 251K 18 251K 1% /dev
tmpfs 251K 12 251K 1% /sys/fs/cgroup
tmpfs 251K 38K 214K 15% /etc/hosts
shm 251K 1 251K 1% /dev/shm
`df -h
Filesystem Size Used Avail Use% Mounted on
none 1.8G 1.8G 0 100%
/ tmpfs 1002M 0 1002M 0%
/dev tmpfs 1002M 0 1002M 0%
/sys/fs/cgroup tmpfs 1.8G 1.8G 0 100%
/etc/hosts shm 64M 0 64M 0% /dev/shm`
Appreciate help.

I have resolved this issue in docker somehow.
By default the memory for the docker is set to be 2048M by default.
First step I performed is stopping my docker machine:
$ docker-machine stop default
Then I went to the $HOME/.docker/machine/machines/default/config.json file and changed the "Memory" setting to be higher, i.e. 4096.
{
"ConfigVersion": 3,
"Driver": {
"VBoxManager": {},
"IPAddress": "192.168.99.102",
"MachineName": "default",
"SSHUser": "docker",
"SSHPort": 59177,
"SSHKeyPath": "/Users/lenok/.docker/machine/machines/default/id_rsa",
"StorePath": "/Users/lenok/.docker/machine",
"SwarmMaster": false,
"SwarmHost": "tcp://0.0.0.0:3376",
"SwarmDiscovery": "",
"CPU": 1,
"Memory": 4096,
"DiskSize": 204800,
"Boot2DockerURL": "",
"Boot2DockerImportVM": "",
"HostDNSResolver": false,
"HostOnlyCIDR": "192.168.99.1/24",
"HostOnlyNicType": "82540EM",
"HostOnlyPromiscMode": "deny",
"NoShare": false,
"DNSProxy": false
},
"DriverName": "virtualbox",
"HostOptions": {
"Driver": "",
"Memory": 0,
"Disk": 0,
"EngineOptions": {
"ArbitraryFlags": [],
"Dns": null,
"GraphDir": "",
"Env": [],
"Ipv6": false,
"InsecureRegistry": [],
"Labels": [],
"LogLevel": "",
"StorageDriver": "",
"SelinuxEnabled": false,
"TlsVerify": true,
"RegistryMirror": [],
"InstallURL": "https://get.docker.com"
},
"SwarmOptions": {
"IsSwarm": false,
"Address": "",
"Discovery": "",
"Master": false,
"Host": "tcp://0.0.0.0:3376",
"Image": "swarm:latest",
"Strategy": "spread",
"Heartbeat": 0,
"Overcommit": 0,
"ArbitraryFlags": [],
"config.json" [noeol] 75L, 2560C
"Overcommit": 0,
"ArbitraryFlags": [],
"Env": null
},
"AuthOptions": {
"CertDir": "/Users/lenok/.docker/machine/certs",
"CaCertPath": "/Users/lenok/.docker/machine/certs/ca.pem",
"CaPrivateKeyPath": "/Users/lenok/.docker/machine/certs/ca-key.pem",
"CaCertRemotePath": "",
"ServerCertPath": "/Users/lenok/.docker/machine/machines/default/server.pem",
"ServerKeyPath": "/Users/lenok/.docker/machine/machines/default/server-key.pem",
"ClientKeyPath": "/Users/lenok/.docker/machine/certs/key.pem",
"ServerCertRemotePath": "",
"ServerKeyRemotePath": "",
"ClientCertPath": "/Users/lenok/.docker/machine/certs/cert.pem",
"ServerCertSANs": [],
"StorePath": "/Users/lenok/.docker/machine/machines/default"
}
},
"Name": "default"
}
Finally, started my docker machine again:
$ docker-machine start default

Issue 18869 refers to a docker-machine memory allocation problem.
This can be tested on the fly with
vboxmanage controlvm default 4096
Since drivers/virtualbox/virtualbox.go#L344-L352 reloads the settings from HOME/.docker/machine/machines/default/config.json, it is best to record that new value in that file (as mentioned in this answer).
That "No space left on device" was seen in docker/machine issue 2285, where the vmdk image created is a dynamically allocated/grow at run-time (default), creating a smaller on-disk foot-print initially, therefore even when creating a ~20GiB vm, with --virtualbox-disk-size 20000 requires on about ~200MiB of free space on-disk to start with.
And the default memory is quite low.
Make sure you that don't have :
any more exited container that you could remove:
docker rm -v $(docker ps --filter status=exited -q 2>/dev/null) 2>/dev/null
any dangling images
docker rmi $(docker images --filter dangling=true -q 2>/dev/null) 2>/dev/null
(Those are the result of rebuild which makes intermediate images unused)
See also "How to remove old and unused Docker images"
Then make sure you don't have an inode exhaustion problem, as in issue 10613.
Check df -hi (with i for inodes)
connected to bash through docker terminal with the purpose to install Oracle JDK8.
Try instead to specify the installation in a Dockerfile and build an image with the JDK installed.

Growing resident memory usage (RSS) of Java Process

Our recent observation on our production system, tells us the resident memory usage of our Java container grows up. Regarding to this problem, we have made some investigations to understand, why java process consumes much more memory than Heap + Thread Stacks + Shared Objects + Code Cache + etc, using some native tools like pmap. As a result of this, we found some 64M memory blocks (in pairs) allocated by native process (probably with malloc/mmap) :
0000000000400000 4K r-x-- /usr/java/jdk1.7.0_17/bin/java
0000000000600000 4K rw--- /usr/java/jdk1.7.0_17/bin/java
0000000001d39000 4108K rw--- [ anon ]
0000000710000000 96000K rw--- [ anon ]
0000000715dc0000 39104K ----- [ anon ]
00000007183f0000 127040K rw--- [ anon ]
0000000720000000 3670016K rw--- [ anon ]
00007fe930000000 62876K rw--- [ anon ]
00007fe933d67000 2660K ----- [ anon ]
00007fe934000000 20232K rw--- [ anon ]
00007fe9353c2000 45304K ----- [ anon ]
00007fe938000000 65512K rw--- [ anon ]
00007fe93bffa000 24K ----- [ anon ]
00007fe940000000 65504K rw--- [ anon ]
00007fe943ff8000 32K ----- [ anon ]
00007fe948000000 61852K rw--- [ anon ]
00007fe94bc67000 3684K ----- [ anon ]
00007fe950000000 64428K rw--- [ anon ]
00007fe953eeb000 1108K ----- [ anon ]
00007fe958000000 42748K rw--- [ anon ]
00007fe95a9bf000 22788K ----- [ anon ]
00007fe960000000 8080K rw--- [ anon ]
00007fe9607e4000 57456K ----- [ anon ]
00007fe968000000 65536K rw--- [ anon ]
00007fe970000000 22388K rw--- [ anon ]
00007fe9715dd000 43148K ----- [ anon ]
00007fe978000000 60972K rw--- [ anon ]
00007fe97bb8b000 4564K ----- [ anon ]
00007fe980000000 65528K rw--- [ anon ]
00007fe983ffe000 8K ----- [ anon ]
00007fe988000000 14080K rw--- [ anon ]
00007fe988dc0000 51456K ----- [ anon ]
00007fe98c000000 12076K rw--- [ anon ]
00007fe98cbcb000 53460K ----- [ anon ]
I interpret the line with 0000000720000000 3670016K refers to the heap space, of which size we define using JVM parameter "-Xmx". Right after that, the pairs begin, of which sum is 64M exactly.
We are using CentOS release 5.10 (Final) 64-bit arch and JDK 1.7.0_17 .
The question is, what are those blocks? Which subsystem does allocate these?
Update: We do not use JIT and/or JNI native code invocations.

It's also possible that there is a native memory leak. A common problem is native memory leaks caused by not closing a ZipInputStream/GZIPInputStream.
A typical way that a ZipInputStream is opened is by a call to Class.getResource/ClassLoader.getResource and calling openConnection().getInputStream() on the java.net.URL instance or by calling Class.getResourceAsStream/ClassLoader.getResourceAsStream. One must ensure that these streams always get closed.
Some commonly used open source libraries have had bugs that leak unclosed java.util.zip.Inflater or java.util.zip.Deflater instances. For example, Nimbus Jose JWT library has fixed a related memory leak in 6.5.1 version. Java JWT (jjwt) had a similar bug that was fixed in 0.10.7 version. The bug pattern in these 2 cases was the fact that calls to DeflaterOutputStream.close() and InflaterInputStream.close() do not call Deflater.end()/Inflater.end() when an Deflater/Inflater instance is provided. In those cases, it's not enough to check the code for streams being closed. Every Deflater/Inflater instances created in the code must have handling that .end() gets called.
One way to check for Zip*Stream leaks is to get a heap dump and search for instances of any class with "zip", "Inflater" or "Deflater" in the name. This is possible in many heap dump analysis tools such as Yourkit Java Profiler, JProfiler or Eclipse MAT. It's also worth checking objects in finalization state since in some cases memory is released only after finalization. Checking for classes that might use native libraries is useful. This applies to TLS/ssl libraries too.
There is an OSS tool called leakchecker from Elastic that is a Java Agent that can be used to find the sources of java.util.zip.Inflater instances that haven't been closed (.end() not called).
For native memory leaks in general (not just for zip library leaks), you can use jemalloc to debug native memory leaks by enabling malloc sampling profiling by specifying the settings in MALLOC_CONF environment variable. Detailed instructions are available in this blog post: http://www.evanjones.ca/java-native-leak-bug.html . This blog post also has information about using jemalloc to debug a native memory leak in java applications. There's also a blog post from Elastic featuring jemalloc and mentioning leakchecker, the tool that Elastic has opensourced to track down problems caused by unclosed zip inflater resources.
There is also a blog post about a native memory leak related to ByteBuffers. Java 8u102 has a special system property jdk.nio.maxCachedBufferSize to limit the cache issue described in that blog post.
-Djdk.nio.maxCachedBufferSize=262144
It's also good to always check open file handles to see if the memory leak is caused by a large amount of mmap:ed files. On Linux lsof can be used to list open files and open sockets:
lsof -Pan -p PID
The report of the memory map of the process could also help investigate native memory leaks
pmap -x PID
For Java processes running in Docker, it should be possible to execute the lsof or pmap command on the "host". You can find the PID of the containerized process with this command
docker inspect --format '{{.State.Pid}}' container_id
It's also useful to get a thread dump (or use jconsole/JMX) to check the number of threads since each thread consumes 1MB of native memory for its stack. A large number of threads would use a lot of memory.
There is also Native Memory Tracking (NMT) in the JVM. That might be useful to check if it's the JVM itself that is using up the native memory.
AsyncProfiler can be used to detect the source of native memory allocations. This is explained in another answer.
The jattach tool can be used also in containerized (docker) environment to trigger threaddumps or heapdumps from the host. It is also able to run jcmd commands which is needed for controlling NMT.

I ran in to the same problem. This is a known problem with glibc >= 2.10
The cure is to set this env variable
export MALLOC_ARENA_MAX=4
IBM article about setting MALLOC_ARENA_MAX
https://www.ibm.com/developerworks/community/blogs/kevgrig/entry/linux_glibc_2_10_rhel_6_malloc_may_show_excessive_virtual_memory_usage?lang=en
Google for MALLOC_ARENA_MAX or search for it on SO to find a lot of references.
You might want to tune also other malloc options to optimize for low fragmentation of allocated memory:
# tune glibc memory allocation, optimize for low fragmentation
# limit the number of arenas
export MALLOC_ARENA_MAX=2
# disable dynamic mmap threshold, see M_MMAP_THRESHOLD in "man mallopt"
export MALLOC_MMAP_THRESHOLD_=131072
export MALLOC_TRIM_THRESHOLD_=131072
export MALLOC_TOP_PAD_=131072
export MALLOC_MMAP_MAX_=65536

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.