Diagnostic Messages for this Task: Container [pid=3347,containerID=container_1490354262227_0013_01_000104] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.5 GB of 5 GB virtual memory used. Killing container. Dump of the process-tree for container_1490354262227_0013_01_000104 : |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 3360 3347 3347 3347 (java) 7596 396 1537003520 262629 /usr/java/latest/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx864m -Djava.io.tmpdir=/mnt3/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1490354262227_0013/container_1490354262227_0013_01_000104/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/mnt/var/log/hadoop/userlogs/application_1490354262227_0013/container_1490354262227_0013_01_000104 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.35.178.86 49938 attempt_1490354262227_0013_m_000004_3 104 |- 3347 2563 3347 3347 (bash) 0 1 115806208 698 /bin/bash -c /usr/java/latest/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx864m -Djava.io.tmpdir=/mnt3/var/lib/hadoop/tmp/nm-local-dir/usercache/hadoop/appcache/application_1490354262227_0013/container_1490354262227_0013_01_000104/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/mnt/var/log/hadoop/userlogs/application_1490354262227_0013/container_1490354262227_0013_01_000104 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA org.apache.hadoop.mapred.YarnChild 10.35.178.86 49938 attempt_1490354262227_0013_m_000004_3 104 1>/mnt/var/log/hadoop/userlogs/application_1490354262227_0013/container_1490354262227_0013_01_000104/stdout 2>/mnt/var/log/hadoop/userlogs/application_1490354262227_0013/container_1490354262227_0013_01_000104/stderr
Container [pid=3347,containerID=container_1490354262227_0013_01_000104] is running beyond physical memory limits. Current usage: 1.0 GB of 1 GB physical memory used; 1.5 GB of 5 GB virtual memory used.
Looks like your process needs more memory and it is exceeding the defined limit.
You need to increase the container size
SET hive.tez.container.size=4096MB
SET hive.auto.convert.join.noconditionaltask.size=1370MB
Read more about this here.
If it is failing on reducer:
Add distribute by partition key to the query. It will distribute
data between reducers and as a result reducers will create less
partitions and consume less memory.
insert overwrite table items_s3_table PARTITION(w_id) select pk, cId,
fcsku, cType, disposition, cReferenceId, snapshotId, quantity, w_id
from items_dynamodb_table distribute by w_id;
Try to decrease bytes per reducer. Decreasing this parameter will increase parallelizm (the number of reducers) and may reduce memory consumption per reducer. hive.exec.reducers.bytes.per.reducer=67108864;
Adjust memory settings if nothing helps.
For mappers:
mapreduce.map.memory.mb=4096;
mapreduce.map.java.opts=-Xmx3000m;
For reducers:
mapreduce.reduce.memory.mb=4096;
mapreduce.reduce.java.opts=-Xmx3000m;
Related
Prerequisites
Application is run in docker-container with Java openjdk version "13.0.1" with these options:
-Xmx6G -XX:MaxHeapFreeRatio=30 -XX:MinHeapFreeRatio=10 -XX:+AlwaysActAsServerClassMachine -XX:+UseContainerSupport -XX:+HeapDumpOnOutOfMemoryError -XX:+ExitOnOutOfMemoryError -XX:HeapDumpPath==/.../crush.hprof -XX:+UnlockDiagnosticVMOptions -XX:NativeMemoryTracking=summary -XX:+PrintNMTStatistics -Xlog:gc*:file=/var/log/.../log.gc.log:time::filecount=5,filesize=100000
When I run jcmd 1 VM.native_memory, I get this:
Total: reserved=9081562KB, committed=1900002KB
- Java Heap (reserved=6291456KB, committed=896000KB)
(mmap: reserved=6291456KB, committed=896000KB)
- Class (reserved=1221794KB, committed=197034KB)
(classes #34434)
( instance classes #32536, array classes #1898)
(malloc=7330KB #121979)
(mmap: reserved=1214464KB, committed=189704KB)
( Metadata: )
( reserved=165888KB, committed=165752KB)
( used=161911KB)
( free=3841KB)
( waste=0KB =0.00%)
( Class space:)
( reserved=1048576KB, committed=23952KB)
( used=21501KB)
( free=2451KB)
( waste=0KB =0.00%)
- Thread (reserved=456661KB, committed=50141KB)
(thread #442)
(stack: reserved=454236KB, committed=47716KB)
(malloc=1572KB #2654)
(arena=853KB #882)
- Code (reserved=255027KB, committed=100419KB)
(malloc=7343KB #26005)
(mmap: reserved=247684KB, committed=93076KB)
- GC (reserved=316675KB, committed=116459KB)
(malloc=47311KB #70516)
(mmap: reserved=269364KB, committed=69148KB)
- Compiler (reserved=1429KB, committed=1429KB)
(malloc=1634KB #2498)
(arena=18014398509481779KB #5)
- Internal (reserved=2998KB, committed=2998KB)
(malloc=2962KB #5480)
(mmap: reserved=36KB, committed=36KB)
- Other (reserved=446581KB, committed=446581KB)
(malloc=446581KB #368)
- Symbol (reserved=36418KB, committed=36418KB)
(malloc=34460KB #906917)
(arena=1958KB #1)
- Native Memory Tracking (reserved=18786KB, committed=18786KB)
(malloc=587KB #8291)
(tracking overhead=18199KB)
- Shared class space (reserved=11180KB, committed=11180KB)
(mmap: reserved=11180KB, committed=11180KB)
- Arena Chunk (reserved=19480KB, committed=19480KB)
(malloc=19480KB)
- Logging (reserved=7KB, committed=7KB)
(malloc=7KB #271)
- Arguments (reserved=17KB, committed=17KB)
(malloc=17KB #471)
- Module (reserved=1909KB, committed=1909KB)
(malloc=1909KB #11057)
- Safepoint (reserved=8KB, committed=8KB)
(mmap: reserved=8KB, committed=8KB)
- Synchronization (reserved=1136KB, committed=1136KB)
(malloc=1136KB #6628)
Here we can see that 'Other' section consumes 446581 KB whereas total committed memory is 1900002 KB.
So, 'Other' section takes 23% of all committed memory!
Also this memory is not freed when application is running.
Because of this I changed java flag -XX:NativeMemoryTracking=summary to -XX:NativeMemoryTracking=detail to check where memory is allocated and got this 2 strange blocks of memory:
[0x00007f8db4b32bae] Unsafe_AllocateMemory0+0x8e
[0x00007f8da416e7db]
(malloc=298470KB type=Other #286)
[0x00007f8db4b32bae] Unsafe_AllocateMemory0+0x8e
[0x00007f8d9b84bc90]
(malloc=148111KB type=Other #82)
Analyze
I tried to use async-profiler to check event Unsafe_AllocateMemory0.
I run async-profiler as agent like this:
java -agentpath:/async-profiler/build/libasyncProfiler.so=start,event=itimer,Unsafe_AllocateMemory0,file=/var/log/.../unsafe_allocate_memory.html
And got this flamegraph: https://i.stack.imgur.com/PbE5D.png
Also, I tried to profile events malloc,mmap,mprotect. malloc showed the same flamegraph as event Unsafe_AllocateMemory0, but flamegraphs for mmap and mprotect were empty.
I thought that problem can be related with C2 compiler and disabled it, but after restart nothing changed - the 'Other' section still occupied a lot of memory memory. Moreover, this application is long-living and I'm not sure that disabling C2 can be a good idea.
I tried to use jeprof to check which part of code executes os.malloc
I run java application like this:
LD_PRELOAD=/usr/local/lib/libjemalloc.so MALLOC_CONF=prof:true,lg_prof_interval:30,lg_prof_sample:17 exec java -jar /srv/app/myapp.jar
After 10+ minutes I used jeprof and got this: https://i.stack.imgur.com/45adD.gif
And again there are 2 blocks of memory which occupied many native memory.
Result
I cannot find the place, which allocates so much memory.
Maybe someone can recommend how to spot the root cause of this problem? And what steps do I need to take to avoid this problem?
UPDATE 1
Thanks to apangin I have finally found the place, where so much memory is occupied!
It's related to Redisson/Lettuce, which are using Netty under the hood: flamegraph
I used experimental native mode and run java:
java -agentpath:/async-profiler/build/libasyncProfiler.so=start,event=nativemem,file=/var/log/.../profile.jfr -jar /srv/app/myapp.jar
Your async-profilers arguments seem wrong.
Change event=itimer,Unsafe_AllocateMemory0 to event=Unsafe_AllocateMemory0
async-profiler also has an experimental nativemem mode specifically for finding native memory leaks. See https://github.com/jvm-profiling-tools/async-profiler/discussions/491 for the details.
Other section in NMT typically includes off-heap memory allocated with Unsafe.allocateMemory, in particular, Direct ByteBuffers.
I have configured 16GB heap in my elastic search node and the node has 36GB
in it, the elastic search java process is consuming 95% of it. If we put together the heap and non heap memory, together they are not 95%. I want to control the memory usage and don't want the system memory usage to go beyond 90%.
Following is the results of ps aux |grep elasticsearch
103 3242 55.0 95.4 208321876 36468416 ? Sl 00:16 778:35 /usr/bin/java -Xms16g -Xmx16g -XX:MaxDirectMemorySize=16g -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=9010 -Dcom.sun.management.jmxremote.local.only=false -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+DisableExplicitGC -XX:+AlwaysPreTouch -server -Xss1m -Djava.awt.headless=true -Dfile.encoding=UTF-8 -Djna.nosys=true -Djdk.io.permissionsUseCanonicalPath=true -Dio.netty.noUnsafe=true -Dio.netty.noKeySetOptimization=true -Dlog4j.shutdownHookEnabled=false -Dlog4j2.disable.jmx=true -Dlog4j.skipJansi=true -XX:+HeapDumpOnOutOfMemoryError -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-5.1.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch -d -p /var/run/elasticsearch/elasticsearch.pid -Edefault.path.logs=/var/log/elasticsearch -Edefault.path.data=/var/lib/elasticsearch -Edefault.path.conf=/etc/elasticsearch
Following is the results of jmap -heap
root#ice-bsd-none-551475:~# jmap -heap 3242
Attaching to process ID 3242, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 25.5-b02
using parallel threads in the new generation.
using thread-local object allocation.
Concurrent Mark-Sweep GC
Heap Configuration:
MinHeapFreeRatio = 40
MaxHeapFreeRatio = 70
MaxHeapSize = 17179869184 (16384.0MB)
NewSize = 959643648 (915.1875MB)
MaxNewSize = 959643648 (915.1875MB)
OldSize = 16220225536 (15468.8125MB)
NewRatio = 2
SurvivorRatio = 8
MetaspaceSize = 21807104 (20.796875MB)
CompressedClassSpaceSize = 1073741824 (1024.0MB)
MaxMetaspaceSize = 17592186044415 MB
G1HeapRegionSize = 0 (0.0MB)
Heap Usage:
New Generation (Eden + 1 Survivor Space):
capacity = 863698944 (823.6875MB)
used = 89336768 (85.19818115234375MB)
free = 774362176 (738.4893188476562MB)
10.343507841547158% used
Eden Space:
capacity = 767754240 (732.1875MB)
used = 82442520 (78.6233139038086MB)
free = 685311720 (653.5641860961914MB)
10.738139329585467% used
From Space:
capacity = 95944704 (91.5MB)
used = 6894248 (6.574867248535156MB)
free = 89050456 (84.92513275146484MB)
7.18564726615864% used
To Space:
capacity = 95944704 (91.5MB)
used = 0 (0.0MB)
free = 95944704 (91.5MB)
0.0% used
concurrent mark-sweep generation:
capacity = 16220225536 (15468.8125MB)
used = 2346399531588267400 (2.237700969303386E12MB)
free = 15354485090581 MB
1.4465887212113964E10% used
31656 interned Strings occupying 4070216 bytes.
Inspite of the total memory from jmap -heap being less than 20GB, the total memory utilisation of the java process is 95.4% from the results of the top command.
I checked the Direct Memory used and Mapped memory used from jconsole
Direct memory used = 10798837846(10GB)
Mapped Memory used = 166282997194(166GB)
I want to reduce the total memory usage on the machine and control the total memory of this elastic search process.
I have a little gap in understanding how a JVM process allocates its own memory. As far as I know
RSS = Heap size + MetaSpace + OffHeap size
where OffHeap consists of thread stacks, direct buffers, mapped files (libraries and jars) and JVM code itself;
At the moment I’m trying to analyze my Java application (Spring Boot + Infinispan) which RSS is 779M (it runs in a docker container, so pid 1 is ok):
[ root#daf5a5ae9bb7:/data ]$ ps -o rss,vsz,sz 1
RSS VSZ SZ
798324 6242160 1560540
According to jvisualvm, committed Heap size is 374M
Metasapce size is 89M
In other words, I want to explain 799M - (374M + 89M) = 316M of OffHeap memory.
My app has (in average) 36 live threads.
Each of these threads consumes 1M:
[ root#fac6d0dfbbb4:/data ]$ java -XX:+PrintFlagsFinal -version |grep ThreadStackSize
intx CompilerThreadStackSize = 0
intx ThreadStackSize = 1024
intx VMThreadStackSize = 1024
So, here we can add 36M.
The only place where the app uses DirectBuffer is NIO. As far as I can see from JMX, it doesn’t consume a lot of resources - only 98K
The last step is mapped libs and jars. But according to pmap (full output)
[ root#daf5a5ae9bb7:/data ]$ pmap -x 1 | grep ".so.*" | awk '{ sum+=$3} END {print sum}'
12896K
plus
root#daf5a5ae9bb7:/data ]$ pmap -x 1 | grep “.jar" | awk '{ sum+=$3} END {print sum}'
9720K
we only have 20M here.
Hence, we still have to explain 316M - (36M + 20M) = 260M :(
Does anyone have any idea what I missed?
Approach:
You may want to use Java HotSpot Native Memory Tracking (NMT).
This may give you an exact list of memory allocated by the JVM, splitted up into the different areas heap, classes, threads, code, GC, compiler, internal, symbols, memory tracking, pooled free chunks, and unknown.
Usage:
You can start your application with -XX:NativeMemoryTracking=summary.
Observations of the current heap can be done with jcmd <pid> VM.native_memory summary.
Where to find jcmd / pid:
On a default OpedJDK installation on Ubuntu this can be found at /usr/bin/jcmd.
By just running jcmd without any parameter, you get a list of running Java applications.
user#pc:~$ /usr/bin/jcmd
5169 Main <-- 5169 is the pid
Output:
You will then receive a complete overview over your heap, looking something like the following:
Total: reserved=664192KB, committed=253120KB <--- total memory tracked by Native Memory Tracking
Java Heap (reserved=516096KB, committed=204800KB) <--- Java Heap
(mmap: reserved=516096KB, committed=204800KB)
Class (reserved=6568KB, committed=4140KB) <--- class metadata
(classes #665) <--- number of loaded classes
(malloc=424KB, #1000) <--- malloc'd memory, #number of malloc
(mmap: reserved=6144KB, committed=3716KB)
Thread (reserved=6868KB, committed=6868KB)
(thread #15) <--- number of threads
(stack: reserved=6780KB, committed=6780KB) <--- memory used by thread stacks
(malloc=27KB, #66)
(arena=61KB, #30) <--- resource and handle areas
Code (reserved=102414KB, committed=6314KB)
(malloc=2574KB, #74316)
(mmap: reserved=99840KB, committed=3740KB)
GC (reserved=26154KB, committed=24938KB)
(malloc=486KB, #110)
(mmap: reserved=25668KB, committed=24452KB)
Compiler (reserved=106KB, committed=106KB)
(malloc=7KB, #90)
(arena=99KB, #3)
Internal (reserved=586KB, committed=554KB)
(malloc=554KB, #1677)
(mmap: reserved=32KB, committed=0KB)
Symbol (reserved=906KB, committed=906KB)
(malloc=514KB, #2736)
(arena=392KB, #1)
Memory Tracking (reserved=3184KB, committed=3184KB)
(malloc=3184KB, #300)
Pooled Free Chunks (reserved=1276KB, committed=1276KB)
(malloc=1276KB)
Unknown (reserved=33KB, committed=33KB)
(arena=33KB, #1)
This gives a detailed overview of the different memory areas used by the JVM, and also shows the reserved and commited memory.
I don't know of a technique that gives you a more detailed memory consumption list.
Further reading:
You can also use -XX:NativeMemoryTracking=detail in combination with further jcmd commands. A more detailed explaination can be found at Java Platform, Standard Edition Troubleshooting Guide - 2.6 The jcmd Utility. You can check possible commands via "jcmd <pid> help"
I'm trying to understand where is used server memory.
When I look to the memory usage on the system via top I've got :
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
6091 oper 20 0 2721m 1.4g 9180 S 0 11.4 6:13.42 java
10854 oper 20 0 2186m 1.1g 5104 S 1 9.1 114:52.15 java
9350 oper 20 0 2293m 971m 4892 S 0 8.0 40:15.68 java
9286 oper 20 0 2082m 800m 4852 S 0 6.6 31:31.44 java
10506 oper 20 0 1936m 711m 4900 S 0 5.9 49:09.64 java
8965 oper 20 0 1918m 680m 5076 S 0 5.6 106:53.10 java
All those process are tomcats 6.0.20 running on JVM 1.6.0_26
As we can see in the top report, one process is using 1.4GO an other 1.1GO... so much more than expected.
When I open JConsole on the first process I can see cumulated memory (the Heap and non heap memory) is arround 200Mo, on the second 385 Mo, the third 235Mo.
So my question is where is the unvisible memory ?
Top - JConsole =
1.4G - 200M = 1.2G
1.1G - 385M = 715M
971M - 235M = 736M
800M - 173M = 627M
Does any one have an idea ?
Thanks a lot.
Can I check heap usage of a running JVM from the commandline, I mean the actual usage rather than the max amount allocated with Xmx.
I need it to be commandline because I don't have access to a windowing environment, and I want script based on the value , the application is running in Jetty Application server
You can use jstat, like :
jstat -gc pid
Full docs here :
http://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html
For Java 8 you can use the following command line to get the heap space utilization in kB:
jstat -gc <PID> | tail -n 1 | awk '{split($0,a," "); sum=a[3]+a[4]+a[6]+a[8]; print sum}'
The command basically sums up:
S0U: Survivor space 0 utilization (kB).
S1U: Survivor space 1 utilization (kB).
EU: Eden space utilization (kB).
OU: Old space utilization (kB).
You may also want to include the metaspace and the compressed class space utilization. In this case you have to add a[10] and a[12] to the awk sum.
All procedure at once. Based on #Till Schäfer answer.
In KB...
jstat -gc $(ps axf | egrep -i "*/bin/java *" | egrep -v grep | awk '{print $1}') | tail -n 1 | awk '{split($0,a," "); sum=(a[3]+a[4]+a[6]+a[8]+a[10]); printf("%.2f KB\n",sum)}'
In MB...
jstat -gc $(ps axf | egrep -i "*/bin/java *" | egrep -v grep | awk '{print $1}') | tail -n 1 | awk '{split($0,a," "); sum=(a[3]+a[4]+a[6]+a[8]+a[10])/1024; printf("%.2f MB\n",sum)}'
"Awk sum" reference:
a[1] - S0C
a[2] - S1C
a[3] - S0U
a[4] - S1U
a[5] - EC
a[6] - EU
a[7] - OC
a[8] - OU
a[9] - PC
a[10] - PU
a[11] - YGC
a[12] - YGCT
a[13] - FGC
a[14] - FGCT
a[15] - GCT
Used for "Awk sum":
a[3] -- (S0U) Survivor space 0 utilization (KB).
a[4] -- (S1U) Survivor space 1 utilization (KB).
a[6] -- (EU) Eden space utilization (KB).
a[8] -- (OU) Old space utilization (KB).
a[10] - (PU) Permanent space utilization (KB).
[Ref.: https://docs.oracle.com/javase/7/docs/technotes/tools/share/jstat.html ]
Thanks!
NOTE: Works to OpenJDK!
FURTHER QUESTION: Wrong information?
If you check memory usage with the ps command, you will see that the java process consumes much more...
ps -eo size,pid,user,command --sort -size | egrep -i "*/bin/java *" | egrep -v grep | awk '{ hr=$1/1024 ; printf("%.2f MB ",hr) } { for ( x=4 ; x<=NF ; x++ ) { printf("%s ",$x) } print "" }' | cut -d "" -f2 | cut -d "-" -f1
UPDATE (2021-02-16):
According to the reference below (and #Till Schäfer comment) "ps can show total reserved memory from OS" (adapted) and "jstat can show used space of heap and stack" (adapted). So, we see a difference between what is pointed out by the ps command and the jstat command.
According to our understanding, the most "realistic" information would be the ps output since we will have an effective response of how much of the system's memory is compromised. The command jstat serves for a more detailed analysis regarding the java performance in the consumption of reserved memory from OS.
[Ref.: http://www.openkb.info/2014/06/how-to-check-java-memory-usage.html ]
If you start execution with gc logging turned on you get the info on file.
Otherwise 'jmap -heap ' will give you what you want.
See the jmap doc page for more.
Please note that jmap should not be used in a production environment unless absolutely needed as the tool halts the application to be able to determine actual heap usage. Usually this is not desired in a production environment.
If you are using JDK 8 and above , use jcmd:
jcmd < pid > GC.heap_info