I use DCOS with Spring boot applications inside Docker containers. I noticed that sometimes containers are killed, but there are no errors in the container logs, only:
Killed
W1114 19:27:59.663599 119266 logging.cpp:91] RAW: Received signal SIGTERM
from process 6484 of user 0; exiting
HealthCheck is enabled only for SQL connection and disk space. The disk is ok on all nodes, in case of SQL problems error should appear in the logs. Other reason could be the memory but it also looks fine.
From marathon.production.json:
"cpus": 0.1,
"mem": 1024,
"disk": 0
And docker-entrypoint.sh:
java -Xmx1024m -server -XX:MaxJavaStackTraceDepth=10 -XX:+UseNUMA
-XX:+UseCondCardMark -XX:-UseBiasedLocking -Xms1024M -Xss1M
-XX:MaxPermSize=128m -XX:+UseParallelGC -jar app.jar
What could be the reason for container killing and are there any logs on DCOS regarding it?
Solved with java -XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap
Or just use openjdk:11.0-jre-slim
Related
I am trying to execute kill command on OnOutOfMemoryError for a SpringBoot application.
Below is the .conf file having command.
JAVA_OPTS="-Xmx512M -XX:OnOutOfMemoryError=\"kill $(lsof -t -i:8080)\""
If I run Spring boot application as "java -jar" with Java Hostspot VM commands it works fine, but while running as Linux systemd service, application is not getting killed.
Exception : "Handler dispatch failed; nested exception is java.lang.OutOfMemoryError: Java heap space"
In my scenario, I run an instance of GeoServer using tomcat with a container in the docker swarm.
I expect swarm to recreate the container after any problem with this instance, but after an OutOfMemory error, the container will never be restarted because the JVM is still running, although the application no longer responds.
For this case, I use OnOutOfMemoryError and the container is eliminated after this type of error, so that the swarm can recreate it.
Tomcat environment to set the JVM parameters.
CATALINA_OPTS="-XX:OnOutOfMemoryError=\"kill -9 %p\"
-Djava.awt.headless=true \
-Dfile.encoding=UTF-8 -server \
-Xms1024m -Xmx3072m -Xss1024k -XX:NewSize=768m \
-XX:+UseParallelGC -XX:MaxGCPauseMillis=500"
You can try to adapt it to your use.
Importantly, i use these software:
Tomcat 9
OpenJDK-11
GeoServer 2.16.x
Debian GNU/Linux 10 (buster)
Docs for consult JVM openJDK-11 configuration options: https://manpages.debian.org/testing/openjdk-11-jre-headless/java.1.en.html
I'm running a local installation of SAP Hybris 1811. I'm trying to increase its memory size since I've been getting OutOfMemory exceptions during SOLR index jobs.
However, I'm not able to reliably increase the memory via any method I've tried. Sometimes after struggling a lot (building the app multiple times, restarting, etc.) Hybris is able to see and use the set memory (I check this using backoffice), but most of the time it defaults to 2 GB and runs out of memory quickly.
What I've tried:
set JAVA_OPTS=-Xms10G -Xmx10G; in catalina.bat
tomcat.javaoptions=-Xmx10G -Xms10G in local.properties
What is the correct way to reliably set a higher memory for local Hybris server?
Please try the following in your local.properties:
tomcat.generaloptions=-Xmx10G -ea -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dorg.tanukisoftware.wrapper.WrapperManager.mbean=true -Djava.endorsed.dirs="%CATALINA_HOME%/lib/endorsed" -Dcatalina.base=%CATALINA_BASE% -Dcatalina.home=%CATALINA_HOME% -Dfile.encoding=UTF-8 -Djava.util.logging.config.file=jdk_logging.properties -Djava.io.tmpdir="${HYBRIS_TEMP_DIR}"
Please make sure to execute ant after making this change. As a general note, whenever you make any change related to tomcat, you need to execute ant.
For production environment, you can set this property as follows:
java.mem=10G
tomcat.generaloptions=-Xmx${java.mem} -Xms${java.mem} -Xss256K -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+CMSClassUnloadingEnabled -XX:+UseCMSInitiatingOccupancyOnly -XX:+CMSParallelRemarkEnabled -XX:+ParallelRefProcEnabled -XX:+CMSScavengeBeforeRemark -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -Xloggc:"${HYBRIS_LOG_DIR}/tomcat/java_gc.log" -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Dorg.tanukisoftware.wrapper.WrapperManager.mbean=true -Djava.endorsed.dirs=../lib/endorsed -Dcatalina.base=%CATALINA_BASE% -Dcatalina.home=%CATALINA_HOME% -Dfile.encoding=UTF-8 -Djava.util.logging.config.file=jdk_logging.properties -Djava.io.tmpdir="${HYBRIS_TEMP_DIR}" -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000
After a bit of digging, I found that the memory limit was ignored only when I've tried to run Hybris server with the debug parameter. I discovered that the properties I've tried to set using tomcat.javaoptions were not in the wrapper-debug.conf file which is used when starting the server in debug mode.
Long story short:
tomcat.javaoptions only gets applied to the default wrapper.conf file and is ignored when launching the server with any parameter such as debug.
For the changes to be applied to the wrapper-debug.conf, I needed to use tomcat.debugjavaoptions property.
In the end, my config file with working memory limit looks like this:
...
tomcat.javaoptions=-Xmx10G -Xms5G
tomcat.debugjavaoptions=-Xdebug -Xnoagent -Xrunjdwp:transport=dt_socket,server=y,address=8000,suspend=n -Xmx10G -Xms5G
I'm tuning our product for G1GC, and as part of that testing, I'm experiencing regular segfaults on my Spark Workers, which of course causes the JVM to crash. When this happens, the Spark Worker/Executor JVM automagically restarts itself, which then overwrites the GC logs that were written for the previous Executor JVM.
To be honest, I'm not quite sure the mechanism for how the Executor JVM restarts itself, but I launch the Spark Driver service via init.d, which in turn calls off to a bash script. I do use a timestamp in that script that gets appended to the GC log filename:
today=$(date +%Y%m%dT%H%M%S%3N)
SPARK_HEAP_DUMP="-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=${SPARK_LOG_HOME}/heapdump_$$_${today}.hprof"
SPARK_GC_LOGS="-Xloggc:${SPARK_LOG_HOME}/gc_${today}.log -XX:LogFile=${SPARK_LOG_HOME}/safepoint_${today}.log"
GC_OPTS="-XX:+UnlockDiagnosticVMOptions -XX:+LogVMOutput -XX:+PrintFlagsFinal -XX:+PrintJNIGCStalls -XX:+PrintTLAB -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=15 -XX:GCLogFileSize=48M -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationConcurrentTime -XX:+PrintGCApplicationStoppedTime -XX:+PrintAdaptiveSizePolicy -XX:+PrintHeapAtGC -XX:+PrintGCCause -XX:+PrintReferenceGC -XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1"
I think the problem is that this script sends these options along to the Spark Driver, which then passes them off to the Spark Executors (via the -Dspark.executor.extraJavaOptions argument), which are all separate servers, and when the Executor JVM crashes, it simply uses the command that had been originally sent to start back up, which would mean that the timestamp portion of the GC log filename is static:
SPARK_STANDALONE_OPTS=`property ${SPARK_APP_CONFIG}/spark.properties "spark-standalone.extra.args"`
SPARK_STANDALONE_OPTS="$SPARK_STANDALONE_OPTS $GC_OPTS $SPARK_GC_LOGS $SPARK_HEAP_DUMP"
exec java ${SPARK_APP_HEAP_DUMP} ${GC_OPTS} ${SPARK_APP_GC_LOGS} \
${DRIVER_JAVA_OPTIONS} \
-Dspark.executor.memory=${EXECUTOR_MEMORY} \
-Dspark.executor.extraJavaOptions="${SPARK_STANDALONE_OPTS}" \
-classpath ${CLASSPATH} \
com.company.spark.Main >> ${SPARK_APP_LOGDIR}/${SPARK_APP_LOGFILE} 2>&1 &
This is making it difficult for me to debug the cause of the segfaults, since I'm losing the activity and state of the Workers that led up to the JVM crash. Any ideas for how I can handle this situation and keep the GC logs on the Workers, even after a JVM crash/segfault?
If you are using Java 8 and above, you may consider getting away with it by adding %p to the log file name to introduce the PID which will be kind of unique per crash.
My Spark driver runs out of memory after running for about 10 hours with the error Exception in thread "dispatcher-event-loop-17" java.lang.OutOfMemoryError: GC overhead limit exceeded. To further debug, I enabled G1GC mode and also the GC logs option using spark.driver.extraJavaOptions=-Dlog4j.configuration=log4j.properties -XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp
but it looks like it is not taking effect on the driver.
The job got stuck again on the driver after 10 hours and I dont see any GC logs under stdout on the driver node under /var/log/hadoop-yar/userlogs/[application-id]/[container-id]/stdout - so not sure where else to look. According to Spark GC tuning docs, it looks like these settings only happen on worker nodes (which I can see in this case as well as workers have GC logs in stdout after I had used the same configs under spark.executor.extraJavaOptions). Is there anyway to enable/acquire GC logs from the driver? Under Spark UI -> Environment, I see these options are listed under spark.driver.extraJavaOptions which is why I assumed it would be working.
Environment:
The cluster is running on Google Dataproc and I use /usr/bin/spark-submit --master yarn --deploy-mode cluster ... from the master to submit jobs.
EDIT
Setting the same options for the driver during the spark-submit command works and I am able to see the GC logs on stdout for the driver. Just that setting the options via SparkConf programmatically does not seem to take effect for some reason.
I believe spark.driver.extraJavaOptions is handled by SparkSubmit.scala, and needs to be passed at invocation. To do that with Dataproc you can add that to the properties field (--properties in gcloud dataproc jobs submit spark).
Also instead of -Dlog4j.configuration=log4j.properties you can use this guide to configure detailed logging.
I could see GC driver logs with:
gcloud dataproc jobs submit spark --cluster CLUSTER_NAME --class org.apache.spark.examples.SparkPi --jars file:///usr/lib/spark/examples/jars/spark-examples.jar --driver-log-levels ROOT=DEBUG --properties=spark.driver.extraJavaOptions="-XX:+PrintFlagsFinal -XX:+PrintReferenceGC -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -XX:+UnlockDiagnosticVMOptions -XX:+G1SummarizeConcMark -XX:+UseG1GC -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp" --
You probably don't need --driver-log-levels ROOT=DEBUG, but can copy in your logging config from log4j.properties. If you really want to use log4j.properties, you can probably use --files log4j.properties
On starting JbossAS 5.1 server on Linux:
26204 jboss 20 0 4874m 1.3g 12m S 144.0 11.4 1:45.50 java
This is before any class-loading.
It starts with minimum 1g (RES) memory. How can i reduce this?
Is there any-way we can suppress memory usages?
inside your %JAVA_HOME%\bin directory (the linux equivelant)
Check the run.conf file for:
if [ "x$JAVA_OPTS" = "x" ]; then
JAVA_OPTS="-Xms128m -Xmx512m -XX:MaxPermSize=256m -Dorg.jboss.resolver.warning=true -Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000"
fi
I am running 5.1.0.GA on a very old PC, and having has lots of memory errors during start up i removed this specification from the JAVA_OPTS spec. I did this in windows so the batch syntax is different, but essentially, i just removed this option completely. It stopped the server from moaning about memory, but i don't know if you can use these options to restrict the memory usage further.
Not really an answer, but you might find it helps