Performance issues : Kafka + Storm + Trident + OpaqueTridentKafkaSpout

Performance issues : Kafka + Storm + Trident + OpaqueTridentKafkaSpout - java

We are seeing some performance issues with Kafka + Storm + Trident + OpaqueTridentKafkaSpout
Mentioned below are our setup details :
Storm Topology :
Broker broker = Broker.fromString("localhost:9092")
GlobalPartitionInformation info = new GlobalPartitionInformation()
if(args[4]){
int partitionCount = args[4].toInteger()
for(int i =0;i<partitionCount;i++){
info.addPartition(i, broker)
}
}
StaticHosts hosts = new StaticHosts(info)
TridentKafkaConfig tridentKafkaConfig = new TridentKafkaConfig(hosts,"test")
tridentKafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme())
OpaqueTridentKafkaSpout kafkaSpout = new OpaqueTridentKafkaSpout(tridentKafkaConfig)
TridentTopology topology = new TridentTopology()
Stream st = topology.newStream("spout1", kafkaSpout).parallelismHint(args[2].toInteger())
.each(kafkaSpout.getOutputFields(), new NEO4JTridentFunction(), new Fields("status"))
.parallelismHint(args[1].toInteger())
Map conf = new HashMap()
conf.put(Config.TOPOLOGY_WORKERS, args[3].toInteger())
conf.put(Config.TOPOLOGY_DEBUG, false)
if (args[0] == "local") {
LocalCluster cluster = new LocalCluster()
cluster.submitTopology("mytopology", conf, topology.build())
} else {
StormSubmitter.submitTopology("mytopology", conf, topology.build())
NEO4JTridentFunction.getGraphDatabaseService().shutdown()
}
Storm.yaml we are using for Storm is as below :
########### These MUST be filled in for a storm configuration
storm.zookeeper.servers:
- "localhost"
# - "server2"
#
storm.zookeeper.port : 2999
storm.local.dir: "/opt/mphrx/neo4j/stormdatadir"
nimbus.childopts: "-Xms2048m"
ui.childopts: "-Xms1024m"
logviewer.childopts: "-Xmx512m"
supervisor.childopts: "-Xms1024m"
worker.childopts: "-Xms2600m -Xss256k -XX:MaxPermSize=128m -XX:PermSize=96m
-XX:NewSize=1000m -XX:MaxNewSize=1000m -XX:MaxTenuringThreshold=1 -XX:SurvivorRatio=6
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly
-server -XX:+AggressiveOpts -XX:+UseCompressedOops -Djava.awt.headless=true -Djava.net.preferIPv4Stack=true
-Xloggc:logs/gc-worker-%ID%.log -verbose:gc
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10 -XX:GCLogFileSize=1m
-XX:+PrintGCDetails -XX:+PrintHeapAtGC -XX:+PrintGCTimeStamps -XX:+PrintClassHistogram
-XX:+PrintTenuringDistribution -XX:-PrintGCApplicationStoppedTime -XX:-PrintGCApplicationConcurrentTime
-XX:+PrintCommandLineFlags -XX:+PrintFlagsFinal"
java.library.path: "/usr/lib/jvm/jdk1.7.0_25"
supervisor.slots.ports:
- 6700
- 6701
- 6702
- 6703
topology.trident.batch.emit.interval.millis: 100
topology.message.timeout.secs: 300
#topology.max.spout.pending: 10000
Size of each message produced in Kafka : 11 KB
Execution time of each bolt(NEO4JTridentFunction) to process the data : 500ms
No. of Storm Workers : 1
Parallelism hint for Spout(OpaqueTridentKafkaSpout): 1
Parallelism hint for Bolt/Function(NEO4JTridentFunction) : 50
We are seeing throughput of around 12msgs/sec from Spout.
Rate of messages produced in Kafka : 150msgs/sec
Both Storm and Kafka are a single node deployment.
We have read about much higher throughput from Storm but are unable to produce the same. Please suggest how to tune the Storm+ Kafka + OpaqueTridentKafkaSpout configuration to achieve higher throughput. Any help in this regard would help us immensely.
Thanks,

You should set spout parallelism same as partition count for mentioned topics.
By default, trident accept one batch for each execution, you should increase this count by changing topology.max.spout.pending property. Since Trident forces ordered transaction management, your execution method (NEO4JTridentFunction)must be fast to reach desired solution.
In addition,you can play with "tridentConfig.fetchSizeBytes", by changing it, you can ingest more data for each new emit call in your spout.
Also you must check your garbage collection log, it will give you clue about real point.
You can enable garbage collection log by adding "-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -verbose:gc -Xloggc:{path}/gc-storm-worker-%ID%.log", in worker.childopts settings in your worker config.
Last but not least, you can use G1GC, if your young generation ratio is higher than normal case.

Please set your worker.childopts based on your system configuration. Use SpoutConfig.fetchSizeBytes to increase the number of bytes being pulled into the topology. Increase your Parallelism hint.

my calculations: if 8 Cores and 500MS per bolt -> ~16 Messages/sec.
if you optimize the bolt, then you will see improvements.
also, for CPU bound bolts, try Parallelism hint = 'amount of total cores'
and increase topology.trident.batch.emit.interval.millis to the amount of time it takes to process entire batch divided by 2.
set topology.max.spout.pending to 1.

Related

Readiness and liveness failed with smallrye metrics in kubernetes

I'm deploying a pod written in quarkus in kubernetes and the startup seems to go fine. But there's a problem with readiness and liveness that result unhealthy.
For metrics I'm using smallrye metrics configured on port 8080 and on path:
quarkus.smallrye-metrics.path=/metrics
If i enter in the pod and i execute
curl localhost:8080/metrics
the response is
# HELP base_classloader_loadedClasses_count Displays the number of classes that are currently loaded in the Java virtual machine.
# TYPE base_classloader_loadedClasses_count gauge
base_classloader_loadedClasses_count 7399.0
# HELP base_classloader_loadedClasses_total Displays the total number of classes that have been loaded since the Java virtual machine has started execution.
# TYPE base_classloader_loadedClasses_total counter
base_classloader_loadedClasses_total 7403.0
# HELP base_classloader_unloadedClasses_total Displays the total number of classes unloaded since the Java virtual machine has started execution.
# TYPE base_classloader_unloadedClasses_total counter
base_classloader_unloadedClasses_total 4.0
# HELP base_cpu_availableProcessors Displays the number of processors available to the Java virtual machine. This value may change during a particular invocation of the virtual machine.
# TYPE base_cpu_availableProcessors gauge
base_cpu_availableProcessors 1.0
# HELP base_cpu_processCpuLoad_percent Displays the "recent cpu usage" for the Java Virtual Machine process. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that none of the CPUs were running threads from the JVM process during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running threads from the JVM 100% of the time during the recent period being observed. Threads from the JVM include the application threads as well as the JVM internal threads. All values between 0.0 and 1.0 are possible depending of the activities going on in the JVM process and the whole system. If the Java Virtual Machine recent CPU usage is not available, the method returns a negative value.
# TYPE base_cpu_processCpuLoad_percent gauge
base_cpu_processCpuLoad_percent 2.3218608761411404E-7
# HELP base_cpu_systemLoadAverage Displays the system load average for the last minute. The system load average is the sum of the number of runnable entities queued to the available processors and the number of runnable entities running on the available processors averaged over a period of time. The way in which the load average is calculated is operating system specific but is typically a damped time-dependent average. If the load average is not available, a negative value is displayed. This attribute is designed to provide a hint about the system load and may be queried frequently. The load average may be unavailable on some platforms where it is expensive to implement this method.
# TYPE base_cpu_systemLoadAverage gauge
base_cpu_systemLoadAverage 0.15
# HELP base_gc_time_total Displays the approximate accumulated collection elapsed time in milliseconds. This attribute displays -1 if the collection elapsed time is undefined for this collector. The Java virtual machine implementation may use a high resolution timer to measure the elapsed time. This attribute may display the same value even if the collection count has been incremented if the collection elapsed time is very short.
# TYPE base_gc_time_total counter
base_gc_time_total_seconds{name="Copy"} 0.032
base_gc_time_total_seconds{name="MarkSweepCompact"} 0.071
# HELP base_gc_total Displays the total number of collections that have occurred. This attribute lists -1 if the collection count is undefined for this collector.
# TYPE base_gc_total counter
base_gc_total{name="Copy"} 4.0
base_gc_total{name="MarkSweepCompact"} 2.0
# HELP base_jvm_uptime_seconds Displays the time from the start of the Java virtual machine in milliseconds.
# TYPE base_jvm_uptime_seconds gauge
base_jvm_uptime_seconds 624.763
# HELP base_memory_committedHeap_bytes Displays the amount of memory in bytes that is committed for the Java virtual machine to use. This amount of memory is guaranteed for the Java virtual machine to use.
# TYPE base_memory_committedHeap_bytes gauge
base_memory_committedHeap_bytes 8.5262336E7
# HELP base_memory_maxHeap_bytes Displays the maximum amount of heap memory in bytes that can be used for memory management. This attribute displays -1 if the maximum heap memory size is undefined. This amount of memory is not guaranteed to be available for memory management if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size.
# TYPE base_memory_maxHeap_bytes gauge
base_memory_maxHeap_bytes 1.348141056E9
# HELP base_memory_usedHeap_bytes Displays the amount of used heap memory in bytes.
# TYPE base_memory_usedHeap_bytes gauge
base_memory_usedHeap_bytes 1.2666888E7
# HELP base_thread_count Displays the current number of live threads including both daemon and non-daemon threads
# TYPE base_thread_count gauge
base_thread_count 11.0
# HELP base_thread_daemon_count Displays the current number of live daemon threads.
# TYPE base_thread_daemon_count gauge
base_thread_daemon_count 7.0
# HELP base_thread_max_count Displays the peak live thread count since the Java virtual machine started or peak was reset. This includes daemon and non-daemon threads.
# TYPE base_thread_max_count gauge
base_thread_max_count 11.0
# HELP vendor_cpu_processCpuTime_seconds Displays the CPU time used by the process on which the Java virtual machine is running in nanoseconds. The returned value is of nanoseconds precision but not necessarily nanoseconds accuracy. This method returns -1 if the the platform does not support this operation.
# TYPE vendor_cpu_processCpuTime_seconds gauge
vendor_cpu_processCpuTime_seconds 4.36
# HELP vendor_cpu_systemCpuLoad_percent Displays the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values betweens 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
# TYPE vendor_cpu_systemCpuLoad_percent gauge
vendor_cpu_systemCpuLoad_percent 2.3565253563367224E-7
# HELP vendor_memory_committedNonHeap_bytes Displays the amount of non heap memory in bytes that is committed for the Java virtual machine to use.
# TYPE vendor_memory_committedNonHeap_bytes gauge
vendor_memory_committedNonHeap_bytes 5.1757056E7
# HELP vendor_memory_freePhysicalSize_bytes Displays the amount of free physical memory in bytes.
# TYPE vendor_memory_freePhysicalSize_bytes gauge
vendor_memory_freePhysicalSize_bytes 5.44448512E9
# HELP vendor_memory_freeSwapSize_bytes Displays the amount of free swap space in bytes.
# TYPE vendor_memory_freeSwapSize_bytes gauge
vendor_memory_freeSwapSize_bytes 0.0
# HELP vendor_memory_maxNonHeap_bytes Displays the maximum amount of used non-heap memory in bytes.
# TYPE vendor_memory_maxNonHeap_bytes gauge
vendor_memory_maxNonHeap_bytes -1.0
# HELP vendor_memory_usedNonHeap_bytes Displays the amount of used non-heap memory in bytes.
# TYPE vendor_memory_usedNonHeap_bytes gauge
vendor_memory_usedNonHeap_bytes 4.7445384E7
# HELP vendor_memoryPool_usage_bytes Current usage of the memory pool denoted by the 'name' tag
# TYPE vendor_memoryPool_usage_bytes gauge
vendor_memoryPool_usage_bytes{name="CodeHeap 'non-nmethods'"} 1357184.0
vendor_memoryPool_usage_bytes{name="CodeHeap 'non-profiled nmethods'"} 976128.0
vendor_memoryPool_usage_bytes{name="CodeHeap 'profiled nmethods'"} 4787200.0
vendor_memoryPool_usage_bytes{name="Compressed Class Space"} 4562592.0
vendor_memoryPool_usage_bytes{name="Eden Space"} 0.0
vendor_memoryPool_usage_bytes{name="Metaspace"} 3.5767632E7
vendor_memoryPool_usage_bytes{name="Survivor Space"} 0.0
vendor_memoryPool_usage_bytes{name="Tenured Gen"} 9872160.0
# HELP vendor_memoryPool_usage_max_bytes Peak usage of the memory pool denoted by the 'name' tag
# TYPE vendor_memoryPool_usage_max_bytes gauge
vendor_memoryPool_usage_max_bytes{name="CodeHeap 'non-nmethods'"} 1369600.0
vendor_memoryPool_usage_max_bytes{name="CodeHeap 'non-profiled nmethods'"} 976128.0
vendor_memoryPool_usage_max_bytes{name="CodeHeap 'profiled nmethods'"} 4793088.0
vendor_memoryPool_usage_max_bytes{name="Compressed Class Space"} 4562592.0
vendor_memoryPool_usage_max_bytes{name="Eden Space"} 2.3658496E7
vendor_memoryPool_usage_max_bytes{name="Metaspace"} 3.5769312E7
vendor_memoryPool_usage_max_bytes{name="Survivor Space"} 2883584.0
vendor_memoryPool_usage_max_bytes{name="Tenured Gen"} 9872160.0
So it seems metrics are working fine, but kubernetes returns this error:
Warning Unhealthy 24m (x9 over 28m) kubelet Liveness probe errored: strconv.Atoi: parsing "metrics": invalid syntax
Warning Unhealthy 4m2s (x70 over 28m) kubelet Readiness probe errored: strconv.Atoi: parsing "metrics": invalid syntax
Any help?
Thanks

First I needed to fix dockerfile.jvm
FROM openjdk:11
ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en'
# We make four distinct layers so if there are application changes the library layers can be re-used
# RUN ls -la target
COPY --chown=185 target/quarkus-app/lib/ /deployments/lib/
COPY --chown=185 target/quarkus-app/*.jar /deployments/
COPY --chown=185 target/quarkus-app/app/ /deployments/app/
COPY --chown=185 target/quarkus-app/quarkus/ /deployments/quarkus/
RUN java -version
EXPOSE 8080
USER root
ENV AB_JOLOKIA_OFF=""
ENV JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager"
ENV JAVA_DEBUG="true"
ENV JAVA_APP_JAR="/deployments/quarkus-run.jar"
CMD java ${JAVA_OPTS} -jar ${JAVA_APP_JAR}
this way jar started working. without that CMD openjdk image is just starting jshell. After that I saw the log below
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
2022-09-21 19:56:00,450 INFO [io.sma.health] (executor-thread-1) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"<default>":"Unable to execute the validation check for the default DataSource: Communications link failure\n\nThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server."}}]}
DB connection in kubernetes is not working.
deploy command: mvn clean package -DskipTests -Dquarkus.kubernetes.deploy=true
"minikube dashboard" looks like below
used the endpoints below
quarkus.smallrye-health.root-path=/health
quarkus.smallrye-health.liveness-path=/health/live
quarkus.smallrye-metrics.path=/metrics
and liveness url looks like below in the firefox
I needed to change some dependencies in pom because I use minikube in my local and needed to delete some java code because of db connection problems, you can find working example at https://github.com/ozkanpakdil/quarkus-examples/tree/master/liveness-readiness-kubernetes
you can see the definition yaml of the deployment below.
mintozzy#mintozzy-MACH-WX9:~$ kubectl get deployments.apps app-version-checker -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-09-21 - 20:29:23 +0000
app.quarkus.io/commit-id: 7d709651868d810cd9a906609c8edad3f9d796c0
deployment.kubernetes.io/revision: "3"
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scheme: http
prometheus.io/scrape: "true"
creationTimestamp: "2022-09-21T20:13:21Z"
generation: 3
labels:
app.kubernetes.io/name: app-version-checker
app.kubernetes.io/version: 1.0.0-SNAPSHOT
name: app-version-checker
namespace: default
resourceVersion: "117584"
uid: 758d420b-ed22-48f8-9d6f-150422a6b38e
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/name: app-version-checker
app.kubernetes.io/version: 1.0.0-SNAPSHOT
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-09-21 - 20:29:23 +0000
app.quarkus.io/commit-id: 7d709651868d810cd9a906609c8edad3f9d796c0
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scheme: http
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app.kubernetes.io/name: app-version-checker
app.kubernetes.io/version: 1.0.0-SNAPSHOT
spec:
containers:
- env:
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: mintozzy/app-version-checker:1.0.0-SNAPSHOT
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /health/live
port: 8080
scheme: HTTP
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
name: app-version-checker
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health/ready
port: 8080
scheme: HTTP
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2022-09-21T20:13:21Z"
lastUpdateTime: "2022-09-21T20:30:03Z"
message: ReplicaSet "app-version-checker-5cb974f465" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2022-09-22T16:09:48Z"
lastUpdateTime: "2022-09-22T16:09:48Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 3
readyReplicas: 1
replicas: 1
updatedReplicas: 1

Dataproc Hive Job - Tez Java heap OOM

I have a problem with my cluster.
the cluster have
2 worker primary
2 secondary worker
30 gb di ram
The cluster runs correctly and launches the job hives for at least about 10h.
After 10h I have an error of :Java heap space
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_292]
Caused by: java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:3236) ~[?:1.8.0_292]
at java.io.ByteArrayOutputStream.toByteArray(ByteArrayOutputStream.java:191) ~[?:1.8.0_292]
at org.apache.hadoop.ipc.ResponseBuffer.toByteArray(ResponseBuffer.java:53) ~[hadoop-common-3.2.2.jar:?]
at org.apache.hadoop.ipc.Client$Connection$3.run(Client.java:1159) ~[hadoop-common-3.2.2.jar:?]
... 5 more
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask
INFO : Completed executing command(queryId=hive_20210923102707_66b4cd11-7cfb-4910-87bc-7f062ce1b00e); Time taken: 75.101 seconds
INFO : Concurrency mode is disabled, not creating a lock manager
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.tez.TezTask (state=08S01,code=1)
i tried to set this cofiguration but it didn't help.
SET hive.execution.engine = tez;
SET hive.exec.dynamic.partition = true;
SET hive.exec.dynamic.partition.mode = nonstrict;
SET mapreduce.job.reduces=1;
SET hive.auto.convert.join=false;
set hive.stats.column.autogather=false;
set hive.optimize.sort.dynamic.partition=true;
is there any way to clean the java heap space or I have got some configuration wrong?
the problem is solved by restarting the cluster

It seems that the default Tez container and heap sizes set by Dataproc are too small for your job. You can update the following Hive properties to increase them:
hive.tez.container.size: The YARN container size in MB for Tez. If set to "-1" (default value), it picks the value of mapreduce.map.memory.mb. Consider increasing the value if the query / Tez app fails with something like "Container is running beyond physical memory limits. Current usage: 4.1 GB of 4 GB physical memory used; 6.0 GB of 20 GB virtual memory used. Killing container.". Example: SET hive.tez.container.size=8192 in Hive, or --properties hive:hive.tez.container.size=8192 when creating the cluster.
hive.tez.java.opts: The JVM options for the Tez YARN application. If not set, it picks the value of mapreduce.map.java.opts. This value should be less or equal to the container size. Consider increasing the JVM heap size if the query / Tez app fails with an OOM exception. Example: SET hive.tez.java.opts=-Xmx8g or --properties hive:hive.tez.java.opts=-Xmx8g when creating the cluster.
You can check /etc/hadoop/conf/mapred-site.xml to get the value of mapreduce.map.java.opts, and /etc/hive/conf/hive-site.xml for the 2 Hive properties mentioned above.

Flink rocksdb compaction filter not working

I have a Flink Cluster. I enabled the compaction filter and using state TTL. but Rocksdb Compaction Filter does not free states from memory.
I have about 300 record / s in my Flink Pipeline
My state TTL config:
#Override
public void open(Configuration parameters) throws Exception {
ListStateDescriptor<ObjectNode> descriptor = new ListStateDescriptor<ObjectNode>(
"my-state",
TypeInformation.of(new TypeHint<ObjectNode>() {})
);
StateTtlConfig ttlConfig = StateTtlConfig
.newBuilder(Time.seconds(600))
.cleanupInRocksdbCompactFilter(2)
.build();
descriptor.enableTimeToLive(ttlConfig);
myState = getRuntimeContext().getListState(descriptor);
}
flink-conf.yaml:
state.backend: rocksdb
state.backend.rocksdb.ttl.compaction.filter.enabled: true
state.backend.rocksdb.block.blocksize: 16kb
state.backend.rocksdb.compaction.level.use-dynamic-size: true
state.backend.rocksdb.thread.num: 4
state.checkpoints.dir: file:///opt/flink/checkpoint
state.backend.rocksdb.timer-service.factory: rocksdb
state.backend.rocksdb.checkpoint.transfer.thread.num: 2
state.backend.local-recovery: true
state.backend.rocksdb.localdir: /opt/flink/rocksdb
jobmanager.execution.failover-strategy: region
rest.port: 8081
state.backend.rocksdb.memory.managed: true
# state.backend.rocksdb.memory.fixed-per-slot: 20mb
state.backend.rocksdb.memory.write-buffer-ratio: 0.9
state.backend.rocksdb.memory.high-prio-pool-ratio: 0.1
taskmanager.memory.managed.fraction: 0.6
taskmanager.memory.network.fraction: 0.1
taskmanager.memory.network.min: 500mb
taskmanager.memory.network.max: 700mb
taskmanager.memory.process.size: 5500mb
taskmanager.memory.task.off-heap.size: 800mb
metrics.reporter.influxdb.class: org.apache.flink.metrics.influxdb.InfluxdbReporter
metrics.reporter.influxdb.host: ####
metrics.reporter.influxdb.port: 8086
metrics.reporter.influxdb.db: ####
metrics.reporter.influxdb.username: ####
metrics.reporter.influxdb.password: ####
metrics.reporter.influxdb.consistency: ANY
metrics.reporter.influxdb.connectTimeout: 60000
metrics.reporter.influxdb.writeTimeout: 60000
state.backend.rocksdb.metrics.estimate-num-keys: true
state.backend.rocksdb.metrics.num-running-compactions: true
state.backend.rocksdb.metrics.background-errors: true
state.backend.rocksdb.metrics.block-cache-capacity: true
state.backend.rocksdb.metrics.block-cache-pinned-usage: true
state.backend.rocksdb.metrics.block-cache-usage: true
state.backend.rocksdb.metrics.compaction-pending: true
Monitoring by Influxdb and Grafana:

As the name of this TTL cleanup implies (cleanupInRocksdbCompactFilter), it relies on the custom RocksDB compaction filter which runs only during compactions. More details in docs.
The metrics in the screenshot show that there have been no running compactions all the time. I suppose that the size of data is just not big enough to start any compaction at this point of time.
Compaction Filter does not free states from memory.
I assume that the main RAM memory is meant by saying 'from memory'. If so, the compaction is not running there at all. The size of data, kept by RocksDB in main memory, is always limited. It is basically a cache and the expired untouched state should just get evicted from it eventually. The rest is periodically spilled to disk and gets compacted over time. This is when this TTL cleanup is supposed to remove the expired state from the system.

KStreams app - excessive memory usage

Im running a (relatively) simple KStreams app:
stream->aggregate by key->filter->foreach
It processes ~200K records / minute on AWS EC2 with 32Gb / 8CPU
Within 10 minutes of starting it the memory usage exceeds 40%. Not long after (typically less than 15min) the OS will OOM-kill it.
Configuration:
config.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "450000");
config.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, 250);
config.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
config.put(StreamsConfig.TIMESTAMP_EXTRACTOR_CLASS_CONFIG, EventTimeExtractor.class.getName());
config.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
config.put(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, "2");
Aggregation step:
KTable<Windowed<String>, String> ktAgg = sourceStream.groupByKey().aggregate(
String::new,
new Aggregate(),
TimeWindows.of(20 * 60 * 1000L).advanceBy(5 * 60 * 1000L).until(40 * 60 * 1000L),
stringSerde, "table_stream");
Using Kafka 0.10.1.1
Suggestions on where to look for the culprit?
side note:
I tried instrumenting this app with NewRelic javaagent. When I ran it with -XX:+useG1GC it did the standard "use lots of memory and then get killed" but when I removed the G1GC param the process ran up System Load to > 21. I had to kill that one myself.
What output there was from NewRelic didn't show anything outrageous w/re memory mgmt.

Java Memory error: unable to create new native thread

I get this error on my UNIX server, when running my java server:
Exception in thread "Thread-0" java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:640)
at [... where ever I launch a new Thread ...]
It happens everytime I have about 600 threads running.
I have set up this variable on the server:
$> ulimit -s 128
What looks strange to me is the result of this command, which I ran when the bug occured the last time:
$> free -m
total used free shared buffers cached
Mem: 2048 338 1709 0 0 0
-/+ buffers/cache: 338 1709
Swap: 0 0 0
I launch my java server like this:
$> /usr/bin/java -server -Xss128k -Xmx500m -jar /path/to/myJar.jar
My debian version:
$> cat /etc/debian_version
5.0.8
My java version:
$> java -version
java version "1.6.0_26"
Java(TM) SE Runtime Environment (build 1.6.0_26-b03)
Java HotSpot(TM) 64-Bit Server VM (build 20.1-b02, mixed mode)
My question: I have read on Internet that my program should handle something like 5000 threads or so. So what is going on, and how to fix please ?
Edit: this is the output of ulimit -a when I open a shell:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 794624
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 100000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 794624
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
I run the script as a daemon from init.d, and this is what i run:
DAEMON=/usr/bin/java
DAEMON_ARGS="-server -Xss128k -Xmx1024m -jar /path/to/myJar.jar"
ulimit -s 128 && ulimit -n 10240 && start-stop-daemon -b --start --quiet --chuid $USER -m -p $PIDFILE --exec $DAEMON -- $DAEMON_ARGS \
|| return 2
Edit2: I have come across this stack overflow question with a java test for threads: how-many-threads-can-a-java-vm-support
public class DieLikeADog {
private static Object s = new Object();
private static int count = 0;
public static void main(String[] argv){
for(;;){
new Thread(new Runnable(){
public void run(){
synchronized(s){
count += 1;
System.err.println("New thread #"+count);
}
for(;;){
try {
Thread.sleep(100);
} catch (Exception e){
System.err.println(e);
}
}
}
}).start();
}
}
}
On my server, the program crashes after 613 threads. Now i'm certain this is not normal, and only related to my server configuration. Can anyone help please ?
Edit 3:
I have come across this article, and many others, explaining that linux can't create 1000 threads, but you guys are telling me that you can do it on your systems. I don't understand.
I have also ran this script on my server: threads_limits.c and the limit is around 620 threads.
My website is now offline and this is the worst thing that could have happened to my project.
I don't know how to recompile glibc and this stuff. It's too much work imo.
I guess I should switch to windows server. Because none of the settings proposed on this page did make any change: The limit on my system is between 600 and 620 threads, no matter the program involved.

Just got the following information: This is a limitation imposed by my host provider. This has nothing to do with programming, or linux.

The underlying operating system (Debian Linux in this case) does not allow the process to create any more threads. See here how to raise the maximum amount: Maximum number of threads per process in Linux?
I have read on Internet that my program should handle something like
5000 threads or so.
This depends on the limits set to the OS, amount of running processes etc. With correct settings you can easily reach that many threads. I'm running Ubuntu on my own computer, and I can create around 32000 threads before hitting the limit on a single Java program with all my "normal stuff" running on the background (this was done with a test program that just created threads that went to sleep immediately in an infinite loop). Naturally, that high amount of threads actually doing something would probably screech consumer hardware to a halt pretty fast.

Can you try the same command with a smaller stack size "-Xss64k" and pass on the results ?

Your JVM fails to allocate stack or some other per-thread memory. Lowering the stack size with -Xss will help increase the number of threads you can create before OOM occurs (but JVM will not let you set arbitrarily small stack size).
You can confirm this is the problem by seeing how the number of threads created change as you tweak -Xss or by running strace on your JVM (you'll almost certainly see an mmap() returning ENOMEM right before an exception is thrown).
Check also your ulimit on virtual size, i.e. ulimit -v. Increasing this limit should let you create more threads with the same stack size. Note that resident set size limit (ulimit -m) is ineffective in current Linux kernel.
Also, lowering -Xmx can help by leaving more memory for thread stacks.

I am starting to suspect that "Native Posix Thread Library" is missing.
>getconf GNU_LIBPTHREAD_VERSION
Should output something like:
NPTL 2.13
If not, the Debian installation is messed up. I am not sure how to fix that, but installing Ubuntu Server seems like a good move...
for ulimit -n 100000; (open fd:s) the following program should be able to handle 32.000 threads or so.
Try it:
package test;
import java.io.InputStream;
import java.net.ServerSocket;
import java.net.Socket;
import java.util.ArrayList;
import java.util.concurrent.Semaphore;
public class Test {
final static Semaphore ss = new Semaphore(0);
static class TT implements Runnable {
#Override
public void run() {
try {
Socket t = new Socket("localhost", 47111);
InputStream is = t.getInputStream();
for (;;) {
is.read();
}
} catch (Throwable t) {
System.err.println(Thread.currentThread().getName() + " : abort");
t.printStackTrace();
System.exit(2);
}
}
}
/**
* #param args
*/
public static void main(String[] args) {
try {
Thread t = new Thread() {
public void run() {
try {
ArrayList<Socket> sockets = new ArrayList<Socket>(50000);
ServerSocket s = new ServerSocket(47111,1500);
ss.release();
for (;;) {
Socket t = s.accept();
sockets.add(t);
}
} catch (Exception e) {
e.printStackTrace();
System.exit(1);
}
}
};
t.start();
ss.acquire();
for (int i = 0; i < 30000; i++) {
Thread tt = new Thread(new TT(), "T" + i);
tt.setDaemon(true);
tt.start();
System.out.println(tt.getName());
try {
Thread.sleep(1);
} catch (InterruptedException e) {
return;
}
}
for (;;) {
System.out.println();
try {
Thread.sleep(1000);
} catch (InterruptedException e) {
return;
}
}
} catch (Throwable t) {
t.printStackTrace();
}
}
}

Related to the OPs self-answer, but I do not yet have the reputation to comment.
I had the identical issue when hosting Tomcat on a V-Server.
All standard means of system checks (process amount/limit, available RAM, etc) indicated a healthy system, while Tomcat crashed with variants of "out of memory / resources / GCThread exceptions".
Turns out some V-Servers have an extra configuration file that limits the amount of allowed Threads per process.
In my case (Ubuntu V -Server with Strato, Germany) this was even documented by the hoster, and the restriction can be lifted manually.
Original documentation by Strato (German) here: https://www.strato.de/faq/server/prozesse-vs-threads-bei-linux-v-servern/
tl;dr: How to fix:
-inspect thread limit per process:
systemctl show --property=DefaultTasksMax
-In my case the default was 60, which was insufficient for Tomcat. I changed it to 256:
vim /etc/systemd/system.conf
Change the value for:
DefaultTasksMax=60
to something higher, e.g. 256. (The HTTPS connector of tomcat has a default thread pool of 200, so it should be at least 200.)
Then reboot, to make the changes take effect.

Its going out of memory.
Also need to change ulimit. If your OS does not give your app enough memory -Xmx i suppose will not make any difference.
I guess the -Xmx500m is having no effect.
Try
ulimit -m 512m
with -Xmx512m

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.