I started up the cluster last night with little issues. After 45 minutes it went to do a log roll, and then the cluster started to throw JVM wait errors. Since then the cluster won't restart. When starting, the resource manager isn't getting started.
The server name node and data nodes are also off line.
I had two installs of hadoop 2.8 on the server, removed first one and reinstalled the second, making the adjustments to the files to get it restarted.
Error logs from when it crashed, appear to be a Java Stack over flow, and out of range, with a growing saved memory size in the logs. My expectations is that I have misconfigured memory some place. I went to deleted and reformat the name nodes and I get the same segmentation error. At this point not sure what to do.
Ubuntu-Mate 16.04, Hadoop 2.8, Spark for Hadoop 2.7, NFS, Scalia, ...
When I go to start yarn now I get the following error message
hduser#nodeserver:/opt/hadoop-2.8.0/sbin$ sudo ./start-yarn.sh
starting yarn daemons
starting resourcemanager, logging to /opt/hadoop-2.8.0/logs/yarn-root->resourcemanager-nodeserver.out
/opt/hadoop-2.8.0/sbin/yarn-daemon.sh: line 103: 5337 Segmentation >fault nohup nice -n $YARN_NICENESS "$HADOOP_YARN_HOME"/bin/yarn -->config $YARN_CONF_DIR $command "$#" > "$log" 2>&1 < /dev/null
node1: starting nodemanager, logging to /opt/hadoop-2.8.0/logs/yarn->root-nodemanager-node1.out
node3: starting nodemanager, logging to /opt/hadoop-2.8.0/logs/yarn->root-nodemanager-node3.out
node2: starting nodemanager, logging to /opt/hadoop-2.8.0/logs/yarn->root-nodemanager-node2.out
starting proxyserver, logging to /opt/hadoop-2.8.0/logs/yarn-root->proxyserver-nodeserver.out
/opt/hadoop-2.8.0/sbin/yarn-daemon.sh: line 103: 5424 Segmentation >fault nohup nice -n $YARN_NICENESS "$HADOOP_YARN_HOME"/bin/yarn -->config $YARN_CONF_DIR $command "$#" > "$log" 2>&1 < /dev/null
hduser#nodeserver:/opt/hadoop-2.8.0/sbin$
Editing to add more error outputs for help
>hduser#nodeserver:/opt/hadoop-2.8.0/sbin$ jps
Segmentation fault
and
>hduser#nodeserver:/opt/hadoop-2.8.0/bin$ sudo ./hdfs namenode -format
Segmentation fault
logs which appear to show the Java stack went crazy and expanded from 512k to 5056k. So, how does one reset their stack?
Heap:
def new generation total 5056K, used 1300K [0x35c00000, 0x36170000, >0x4a950000)
eden space 4544K, 28% used [0x35c00000, 0x35d43b60, 0x36070000)
from space 512K, 1% used [0x360f0000, 0x360f1870, 0x36170000)
to space 512K, 0% used [0x36070000, 0x36070000, 0x360f0000)
tenured generation total 10944K, used 9507K [0x4a950000, 0x4b400000, >0x74400000)
the space 10944K, 86% used [0x4a950000, 0x4b298eb8, 0x4b299000, 0x4b400000)
Metaspace used 18051K, capacity 18267K, committed 18476K, >reserved 18736K
Update 24 hours later, I have tried complete reinstall's on Java and Hadoop, and still no luck. When I try java -version I still get segmentation fault.
Appears I have a Stack Overflow, and no easy fix. Easier to start over and rebuild the cluster with clean software.
Related
This problem occurred when I used chipyard to compile Boom. Is this because of insufficient memory? I am running on a 1 core 2G cloud server.
/bin/bash: line 1: 9986 Killed java -Xmx8G -Xss8M
-XX:MaxPermSize=256M -jar /home/cuiyujie/workspace/Boom/chipyard/generators/rocket-chip/sbt-launch.jar
-Dsbt.sourcemode=true -Dsbt.workspace=/home/cuiyujie/workspace/Boom/chipyard/tools ";project utilities; runMain utilities.GenerateSimFiles -td
/home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig
-sim verilator"
/home/cuiyujie/workspace/Boom/chipyard/common.mk:86: recipe for target
'/home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/sim_files.f'
failed
make: *** [/home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/sim_files.f]
Error 137
When I adjusted the memory to 4G, this appeared.
Done elaborating. OpenJDK 64-Bit Server VM warning: INFO:
os::commit_memory(0x00000006dc3b7000, 97148928, 0) failed;
error='Cannot allocate memory' (errno=12)
There is insufficient memory for the Java Runtime Environment to continue.
Native memory allocation (mmap) failed to map 97148928 bytes for committing reserved memory.
An error report file with more information is saved as:
/home/cuiyujie/workspace/Boom/chipyard/hs_err_pid2876.log /home/cuiyujie/workspace/Boom/chipyard/common.mk:97: recipe for target
'generator_temp' failed make: *** [generator_temp] Error 1
Should I adjust to 8G memory, or through what command to increase the memory size that the process can use?
When I adjusted the memory to 16G, this appeared.
/bin/bash: line 1: 2642 Killed java -Xmx8G -Xss8M
-XX:MaxPermSize=256M -jar /home/cuiyujie/workspace/Boom/chipyard/generators/rocket-chip/sbt-launch.jar
-Dsbt.sourcemode=true -Dsbt.workspace=/home/cuiyujie/workspace/Boom/chipyard/tools ";project tapeout; runMain barstools.tapeout.transforms.GenerateTopAndHarness -o
/home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.top.v
-tho /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.harness.v
-i /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.fir
--syn-top ChipTop --harness-top TestHarness -faf /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.anno.json
-tsaof /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.top.anno.json
-tdf /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/firrtl_black_box_resource_files.top.f
-tsf /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.top.fir
-thaof /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.harness.anno.json
-hdf /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/firrtl_black_box_resource_files.harness.f
-thf /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.harness.fir
--infer-rw --repl-seq-mem -c:TestHarness:-o:/home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.top.mems.conf
-thconf /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig/chipyard.TestHarness.LargeBoomConfig.harness.mems.conf
-td /home/cuiyujie/workspace/Boom/chipyard/sims/verilator/generated-src/chipyard.TestHarness.LargeBoomConfig
-ll error" /home/cuiyujie/workspace/Boom/chipyard/common.mk:123: recipe for target 'firrtl_temp' failed make: *** [firrtl_temp] Error
137
Short answer : yes
Error 137 is thrown when your host runs out of memory.
"I am running on a 1 core 2G cloud server"
When you try to assign 8GB to the JVM, OOM-Killer says "no-no, f... no way", and kicks in sending a SIGKILL; This Killer is a proactive process that jumps into saving the system when its memory level goes too low, by killing the resource-abusive processes.
In this case, the abusive process (very abusive, indeed) is your java program, which is trying to allocate more than(*) 4 times the maximum available memory in your host.
Exit Codes With Special Meanings
[error code 137 --> kill -9] (SIGKILL)
You should either:
Assign at max ~1.2GB - 1.5GB to your process. (and keep your fingers crossed)
Change your host for something a little powerful/bigger if you do require that much memory for your process.
Check if you really require 8GB for that process.
Also note that the given params are error-prone: Xmx8G -Xss8M means a maximum of 8GB and a minimum of 8M for the heap. This should be closer, as Xmx8G - Xms4G
*As the free memory won't be 2GB either, but something in between 1.6-1.8 GB
I run JMeter test for ActiveMQ using Linux build agent I've got java.lang.OutOfMemoryError: Java heap space. Detailed log:
at java.util.Arrays.copyOf(Arrays.java:3332)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:124) ~
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:448)
at java.lang.StringBuilder.append(StringBuilder.java:136
at org.apache.jmeter.protocol.jms.sampler.SubscriberSampler.extractContent(SubscriberSampler.java:282) ~[ApacheJMeter_jms.jar:5.3]
at org.apache.jmeter.protocol.jms.sampler.SubscriberSampler.sample(SubscriberSampler.java:186) ~[ApacheJMeter_jms.jar:5.3
at org.apache.jmeter.protocol.jms.sampler.BaseJMSSampler.sample(BaseJMSSampler.java:98) ~
at org.apache.jmeter.threads.JMeterThread.doSampling(JMeterThread.java:635) ~[ApacheJMeter_core.jar:5.4]
at org.apache.jmeter.threads.JMeterThread.executeSamplePackage(JMeterThread.java:558) ~[ApacheJMeter_core.jar:5.4]
at org.apache.jmeter.threads.JMeterThread.processSampler(JMeterThread.java:489) ~[ApacheJMeter_core.jar:5.4]
at org.apache.jmeter.threads.JMeterThread.run(JMeterThread.java:256) ~[ApacheJMeter_core.jar:5.4]
I've already allocated maximum HEAP memory (-Xmx8g), but it doesn't help. Yet the same test with the same configuration on Windows build agent passed without Out of memory error.
How can it be handled? Maybe some configuration should be done for Linux machine?
Are you sure your Heap setting gets applied on Linux?
You can check it my creating a simple test plan with single JSR223 Sampler using the following code:
println('Max heap size: ' + Runtime.getRuntime().maxMemory() / 1024 / 1024 + ' megabytes')
and when you run JMeter in command-line non-GUI mode you will see the current maximum JVM heap size printed:
In order to make the change permanent amend this line in jmeter startup script according to your requirements.
The issue was resolved after updating Java to 11 version on Linux machines.
I want to build Android 10 from source code and followed the official instructions. In order to get started, I want to simply build it for the emulator. However, the build keeps failing and I get the following error:
[11177/12864] rm -rf "out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/out" "out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/srcjars" "out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/stubsDir" && mkdir -p "out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/out" "out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/srcjars" "out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/stubsDir" && out/soong/host/linux-x86/bin/zipsync -d out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/srcjars -l out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/srcjars/list -f "*.java" out/soong/.intermediates/frameworks/base/framework-javastream-protos/gen/frameworks/base/core/proto/android/privacy.srcjar [...]
FAILED: out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/api-stubs-docs-stubs.srcjar out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/api-stubs-docs_api.txt out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/api-stubs-docs_removed.txt out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/private.txt out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/api-stubs-docs_annotations.zip out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/api-versions.xml out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/api-stubs-docs_api.xml out/soong/.intermediates/frameworks/base/api-stubs-docs/android_common/api-stubs-docs_last_released_api.xml
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:68)
at java.base/java.nio.CharBuffer.allocate(CharBuffer.java:341)
at java.base/java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:794)
at java.base/java.nio.charset.Charset.decode(Charset.java:818)
at com.intellij.openapi.fileEditor.impl.LoadTextUtil.convertBytes(LoadTextUtil.java:640)
at com.intellij.openapi.fileEditor.impl.LoadTextUtil.getTextByBinaryPresentation(LoadTextUtil.java:555)
at com.intellij.openapi.fileEditor.impl.LoadTextUtil.getTextByBinaryPresentation(LoadTextUtil.java:545)
at com.intellij.openapi.fileEditor.impl.LoadTextUtil.loadText(LoadTextUtil.java:531)
at com.intellij.openapi.fileEditor.impl.LoadTextUtil.loadText(LoadTextUtil.java:503)
at com.intellij.mock.MockFileDocumentManagerImpl.getDocument(MockFileDocumentManagerImpl.java:53)
at com.intellij.psi.AbstractFileViewProvider.getDocument(AbstractFileViewProvider.java:194)
at com.intellij.psi.AbstractFileViewProvider$VirtualFileContent.getText(AbstractFileViewProvider.java:484)
at com.intellij.psi.AbstractFileViewProvider.getContents(AbstractFileViewProvider.java:174)
at com.intellij.psi.impl.source.PsiFileImpl.loadTreeElement(PsiFileImpl.java:204)
at com.intellij.psi.impl.source.PsiFileImpl.calcTreeElement(PsiFileImpl.java:709)
at com.intellij.psi.impl.source.PsiJavaFileBaseImpl.getClasses(PsiJavaFileBaseImpl.java:66)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$Companion.findClassInPsiFile(KotlinCliJavaFileManagerImpl.kt:250)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$Companion.access$findClassInPsiFile(KotlinCliJavaFileManagerImpl.kt:246)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl.findPsiClassInVirtualFile(KotlinCliJavaFileManagerImpl.kt:216)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl.access$findPsiClassInVirtualFile(KotlinCliJavaFileManagerImpl.kt:47)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$findClasses$1$$special$$inlined$forEachClassId$lambda$1.invoke(KotlinCliJavaFileManagerImpl.kt:155)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$findClasses$1$$special$$inlined$forEachClassId$lambda$1.invoke(KotlinCliJavaFileManagerImpl.kt:47)
at org.jetbrains.kotlin.cli.jvm.index.JvmDependenciesIndexImpl$traverseDirectoriesInPackage$1.invoke(JvmDependenciesIndexImpl.kt:77)
at org.jetbrains.kotlin.cli.jvm.index.JvmDependenciesIndexImpl$traverseDirectoriesInPackage$1.invoke(JvmDependenciesIndexImpl.kt:32)
at org.jetbrains.kotlin.cli.jvm.index.JvmDependenciesIndexImpl.search(JvmDependenciesIndexImpl.kt:131)
at org.jetbrains.kotlin.cli.jvm.index.JvmDependenciesIndexImpl.traverseDirectoriesInPackage(JvmDependenciesIndexImpl.kt:76)
at org.jetbrains.kotlin.cli.jvm.index.JvmDependenciesIndex$DefaultImpls.traverseDirectoriesInPackage$default(JvmDependenciesIndex.kt:35)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$findClasses$1.invoke(KotlinCliJavaFileManagerImpl.kt:151)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl$findClasses$1.invoke(KotlinCliJavaFileManagerImpl.kt:47)
at org.jetbrains.kotlin.util.PerformanceCounter.time(PerformanceCounter.kt:91)
at org.jetbrains.kotlin.cli.jvm.compiler.KotlinCliJavaFileManagerImpl.findClasses(KotlinCliJavaFileManagerImpl.kt:147)
at com.intellij.psi.impl.PsiElementFinderImpl.findClasses(PsiElementFinderImpl.java:45)
When searching for a solution, I only find problems related to jack-server. As I understand jack is not used in recent builds anymore. Also, I tried to reduce the number of build threads using m -j1 without success.
Here is some info about my setup: 4 core CPU, 8GB RAM
============================================
PLATFORM_VERSION_CODENAME=REL
PLATFORM_VERSION=10
TARGET_PRODUCT=aosp_arm
TARGET_BUILD_VARIANT=eng
TARGET_BUILD_TYPE=release
TARGET_ARCH=arm
TARGET_ARCH_VARIANT=armv7-a-neon
TARGET_CPU_VARIANT=generic
HOST_ARCH=x86_64
HOST_2ND_ARCH=x86
HOST_OS=linux
HOST_OS_EXTRA=Linux-4.4.0-142-generic-x86_64-Ubuntu-14.04.6-LTS
HOST_CROSS_OS=windows
HOST_CROSS_ARCH=x86
HOST_CROSS_2ND_ARCH=x86_64
HOST_BUILD_TYPE=release
BUILD_ID=QQ1D.200205.002
OUT_DIR=out
============================================
After some research I found a solution. During build, /prebuilts/jdk/jdk9/linux-x86/bin/java is being called without the -Xmx option. When typing
$ /prebuilts/jdk/jdk9/linux-x86/bin/java -XX:+PrintFlagsFinal -version | grep -iE 'HeapSize'
into the commandline, I found out that a maximum of only about 2 GB of heap were allowed.
Solution
Because I don't know in which file java is being called, I just set the heap size to 4 GB using an enviroment variable:
$ export _JAVA_OPTIONS="-Xmx4g"
Java will automatically pick this option up.
(Optional) I also increased the swap size from 8 GB to 20 GB.
So I'm new to Solr and am following tutorials for the most part using Solr 8.3.1 (the most recent release as of posting).
I've got Java version 1.8.0_181 installed on my Windows 10 machine and I've added solr/bin to the PATH variable.
When I run solr start I get:
/d/Program Files and Documents/solr-8.3.1/bin/solr: line 1525: ulimit: -m: invalid option
ulimit: usage: ulimit [-SHabcdefiklmnpqrstuvxPT] [limit]
*** [WARN] *** Your open file limit is currently 256.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
*** [WARN] *** Your Max Processes Limit is currently 256.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
/d/Program Files and Documents/solr-8.3.1/bin/solr: line 1542: [: !=: unary operator expected
NOTE: Please install lsof as this script needs it to determine if Solr is listening on port 8983.
Started Solr server on port 8983 (pid=). Happy searching!
Which says to me that it started running, but when I run solr status I get:
Found 1 Solr nodes:
Solr process 56136 from /d/Program Files and Documents/solr-8.3.1/bin/solr-8983.pid not found.
Which doesn't seem right. I continued with the tutorial anyway and ran solr start -e cloud and got:
$ solr start -e cloud
/d/Program Files and Documents/solr-8.3.1/bin/solr: line 1525: ulimit: -m: invalid option
ulimit: usage: ulimit [-SHabcdefiklmnpqrstuvxPT] [limit]
*** [WARN] *** Your open file limit is currently 256.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
*** [WARN] *** Your Max Processes Limit is currently 256.
It should be set to 65000 to avoid operational disruption.
If you no longer wish to see this warning, set SOLR_ULIMIT_CHECKS to false in your profile or solr.in.sh
/d/Program Files and Documents/solr-8.3.1/bin/solr: line 1542: [: !=: unary operator expected
Error: Could not find or load main class org.apache.solr.util.SolrCLI
I tried solr start -e techproducts with the same result.
I've seen answers for this with earlier versions of Solr and usually the answers involved updating the Solr version... Which I'm already at the most recent? I'm sure I'm missing something dumb, but any pointers in the right direction would be much appreciated! Thanks so much!
EDIT 1: Solved! Was using Git Bash which is Cygwin-based. Works like a charm with Windows Command Prompt and with Windows Powershell.
Output for solr start:
Java HotSpot(TM) 64-Bit Server VM warning: JVM cannot use large page memory because it does not have enough privilege to lock pages in memory.
Waiting up to 30 to see Solr running on port 8983
Started Solr server on port 8983. Happy searching!
Output for solr status:
Found Solr process 50764 running on port 8983
{
"solr_home":"D:\\Program Files and Documents\\solr-8.3.1\\server\\solr",
"version":"8.3.1 a3d456fba2cd1b9892defbcf46a0eb4d4bb4d01f - ishan - 2019-11-29 11:51:37",
"startTime":"2019-12-19T22:20:25.127Z",
"uptime":"0 days, 0 hours, 0 minutes, 14 seconds",
"memory":"176.3 MB (%34.4) of 512 MB"}
Output for solr start -e cloud:
Welcome to the SolrCloud example!
This interactive session will help you launch a SolrCloud cluster on your local workstation.
To begin, how many Solr nodes would you like to run in your local cluster? (specify 1-4 nodes) [2]:
Seems like you're running the *nix-version of bin/solr under cygwin from the /d/... path. Under Windows you should use the bin/solr.cmd command with the standard command line console (´cmd.exe`).
It seems the solr.in script contains options not recognized by the version you have of bash inside cygwin. I'm guessing that could also affect the setup of the classpath which means that the classes aren't found where they'd be expected to.
I´m having repeated crashes in my Cloudera cluster HDFS Datanodes due to an OutOfMemoryError:
java.lang.OutOfMemoryError: Java heap space
Dumping heap to /tmp/hdfs_hdfs-DATANODE-e26e098f77ad7085a5dbf0d369107220_pid18551.hprof ...
Heap dump file created [2487730300 bytes in 16.574 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="/usr/lib64/cmf/service/common/killparent.sh"
# Executing /bin/sh -c "/usr/lib64/cmf/service/common/killparent.sh"...
18551 TS 19 ? 00:25:37 java
Wed Aug 7 11:44:54 UTC 2019
JAVA_HOME=/usr/lib/jvm/java-openjdk
using /usr/lib/jvm/java-openjdk as JAVA_HOME
using 5 as CDH_VERSION
using /run/cloudera-scm-agent/process/3087-hdfs-DATANODE as CONF_DIR
using as SECURE_USER
using as SECURE_GROUP
CONF_DIR=/run/cloudera-scm-agent/process/3087-hdfs-DATANODE
CMF_CONF_DIR=/etc/cloudera-scm-agent
4194304
When analyzing the heap dump, the apparent biggest suspects are millions of instances of ScanInfo apparently quequed in the ExecutorService of the class org.apache.hadoop.hdfs.server.datanode.DirectoryScanner.
When I inspect the content of each ScanInfo runnable object, I don´t see anything weird:
Apart from this and a bit high block count in HDFS, I don´t get any other information apart from the different DataNodes crashing randomly in my cluster.
Any idea why these objects keep queueing up in the DirectoryScanner thread pool?
You can try once below command.
$ hadoop dfsadmin -finalizeUpgrade
The -finalizeUpgrade command removes the previous version of the NameNode’s and DataNodes’ storage directories.