Using DL4J in web application - RuntimeException: Op [matmul] execution failed - java

I have a MultiLayerNetwork which is loaded in an application which is loaded on a payara server.
The model loads fine but on requesting an output on a INDArray I receive a RuntimeException.
The system I'm working on seems to work fine during the initialization:
org.nd4j.linalg.factory.Nd4jBackend - Loaded [CpuBackend] backend
org.nd4j.nativeblas.NativeOpsHolder - Number of threads used for linear algebra: 4
org.nd4j.linalg.cpu.nativecpu.CpuNDArrayFactory - Binary level Generic x86 optimization level AVX512
org.nd4j.nativeblas.Nd4jBlas - Number of threads used for OpenMP BLAS: 8
org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CPU]; OS: [Linux]
org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [16]; Memory: [7,1GB]
org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [OPENBLAS]
org.nd4j.linalg.cpu.nativecpu.CpuBackend - Backend build information:
GCC: "7.5.0"
STD version: 201103L
DEFAULT_ENGINE: samediff::ENGINE_CPU
HAVE_FLATBUFFERS
HAVE_OPENBLAS
org.deeplearning4j.nn.multilayer.MultiLayerNetwork - Starting MultiLayerNetwork with WorkspaceModes set to [training: ENABLED; inference: ENABLED], cacheMode set to [NONE]
Requesting a prediction results in:
ERROR org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner - Failed to execute op matmul. Attempted to execute with 2 inputs, 1 outputs, 2 targs,0 bargs and 3 iargs. Inputs: [(FLOAT,[1,9],c), (FLOAT,[9,400],f)]. Outputs: [(FLOAT,[1,400],f)]. tArgs: [1.0, 0.0]. iArgs: [0, 0, 0]. bArgs: -. Op own name: "81f000d9-6011-4d1f-adc4-af57fb7d11e6" - Please see above message (printed out from c++) for a possible cause of error.
with the following stack trace:
java.lang.RuntimeException: Op [matmul] execution failed
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1561) ~[nd4j-native-1.0.0-M2.jar:?]
at org.nd4j.linalg.factory.Nd4j.exec(Nd4j.java:6522) ~[nd4j-api-1.0.0-M2.jar:?]
at org.nd4j.linalg.api.blas.impl.BaseLevel3.gemm(BaseLevel3.java:62) ~[nd4j-api-1.0.0-M2.jar:?]
at org.nd4j.linalg.api.ndarray.BaseNDArray.mmuli(BaseNDArray.java:3194) ~[nd4j-api-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.layers.BaseLayer.preOutputWithPreNorm(BaseLayer.java:322) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.layers.BaseLayer.preOutput(BaseLayer.java:295) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.layers.BaseLayer.activate(BaseLayer.java:343) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.layers.AbstractLayer.activate(AbstractLayer.java:262) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.outputOfLayerDetached(MultiLayerNetwork.java:1341) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.output(MultiLayerNetwork.java:2453) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.output(MultiLayerNetwork.java:2416) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.output(MultiLayerNetwork.java:2407) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.output(MultiLayerNetwork.java:2394) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at org.deeplearning4j.nn.multilayer.MultiLayerNetwork.output(MultiLayerNetwork.java:2490) ~[deeplearning4j-nn-1.0.0-M2.jar:?]
at com.[****].lambda$nnRegression$1(myCustomJavaClass.java:264)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.getCustomOperations(NativeOpExecutioner.java:1365) ~[nd4j-native-1.0.0-M2.jar:?]
at org.nd4j.linalg.api.ops.DynamicCustomOp.opHash(DynamicCustomOp.java:392) ~[nd4j-api-1.0.0-M2.jar:?]
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1900) ~[nd4j-native-1.0.0-M2.jar:?]
at org.nd4j.linalg.cpu.nativecpu.ops.NativeOpExecutioner.exec(NativeOpExecutioner.java:1540) ~[nd4j-native-1.0.0-M2.jar:?]
... 139 more
I have generated a MVP running the DL4J functions in a standalone application and the loading of the model as well as the inference work fine. I would expect the same behavior on the application server.
MultiLayerNetwork net = MultiLayerNetwork.load(nn, false);
List<Integer> obsList = new ArrayList<>();
obsList.add(1);obsList.add(18);
obsList.add(1);obsList.add(24);
obsList.add(1);obsList.add(15);
obsList.add(1);obsList.add(13);
obsList.add(2);
int[] obsArray = obsList.stream().mapToInt(Integer::intValue).toArray();
int[][] flat = new int[][] { obsArray };
INDArray test = Nd4j.create(flat);
INDArray y = net.output(test); // <---- THIS IS WHERE THE ERROR OCCURS
The application is packed in an ear-file, the dl4j depedencies are declared as provided in the POM.
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>deeplearning4j-core</artifactId>
<version>1.0.0-M2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-native-platform</artifactId>
<version>1.0.0-M2</version>
<scope>provided</scope>
</dependency>
The server running this application is a Cent OS 8 with a Payara Server 5.193.1 #badassfish (build 275).
Using the same pattern results in the given stacktrace on the application server. I'm wondering what might cause the error.

Related

Readiness and liveness failed with smallrye metrics in kubernetes

I'm deploying a pod written in quarkus in kubernetes and the startup seems to go fine. But there's a problem with readiness and liveness that result unhealthy.
For metrics I'm using smallrye metrics configured on port 8080 and on path:
quarkus.smallrye-metrics.path=/metrics
If i enter in the pod and i execute
curl localhost:8080/metrics
the response is
# HELP base_classloader_loadedClasses_count Displays the number of classes that are currently loaded in the Java virtual machine.
# TYPE base_classloader_loadedClasses_count gauge
base_classloader_loadedClasses_count 7399.0
# HELP base_classloader_loadedClasses_total Displays the total number of classes that have been loaded since the Java virtual machine has started execution.
# TYPE base_classloader_loadedClasses_total counter
base_classloader_loadedClasses_total 7403.0
# HELP base_classloader_unloadedClasses_total Displays the total number of classes unloaded since the Java virtual machine has started execution.
# TYPE base_classloader_unloadedClasses_total counter
base_classloader_unloadedClasses_total 4.0
# HELP base_cpu_availableProcessors Displays the number of processors available to the Java virtual machine. This value may change during a particular invocation of the virtual machine.
# TYPE base_cpu_availableProcessors gauge
base_cpu_availableProcessors 1.0
# HELP base_cpu_processCpuLoad_percent Displays the "recent cpu usage" for the Java Virtual Machine process. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that none of the CPUs were running threads from the JVM process during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running threads from the JVM 100% of the time during the recent period being observed. Threads from the JVM include the application threads as well as the JVM internal threads. All values between 0.0 and 1.0 are possible depending of the activities going on in the JVM process and the whole system. If the Java Virtual Machine recent CPU usage is not available, the method returns a negative value.
# TYPE base_cpu_processCpuLoad_percent gauge
base_cpu_processCpuLoad_percent 2.3218608761411404E-7
# HELP base_cpu_systemLoadAverage Displays the system load average for the last minute. The system load average is the sum of the number of runnable entities queued to the available processors and the number of runnable entities running on the available processors averaged over a period of time. The way in which the load average is calculated is operating system specific but is typically a damped time-dependent average. If the load average is not available, a negative value is displayed. This attribute is designed to provide a hint about the system load and may be queried frequently. The load average may be unavailable on some platforms where it is expensive to implement this method.
# TYPE base_cpu_systemLoadAverage gauge
base_cpu_systemLoadAverage 0.15
# HELP base_gc_time_total Displays the approximate accumulated collection elapsed time in milliseconds. This attribute displays -1 if the collection elapsed time is undefined for this collector. The Java virtual machine implementation may use a high resolution timer to measure the elapsed time. This attribute may display the same value even if the collection count has been incremented if the collection elapsed time is very short.
# TYPE base_gc_time_total counter
base_gc_time_total_seconds{name="Copy"} 0.032
base_gc_time_total_seconds{name="MarkSweepCompact"} 0.071
# HELP base_gc_total Displays the total number of collections that have occurred. This attribute lists -1 if the collection count is undefined for this collector.
# TYPE base_gc_total counter
base_gc_total{name="Copy"} 4.0
base_gc_total{name="MarkSweepCompact"} 2.0
# HELP base_jvm_uptime_seconds Displays the time from the start of the Java virtual machine in milliseconds.
# TYPE base_jvm_uptime_seconds gauge
base_jvm_uptime_seconds 624.763
# HELP base_memory_committedHeap_bytes Displays the amount of memory in bytes that is committed for the Java virtual machine to use. This amount of memory is guaranteed for the Java virtual machine to use.
# TYPE base_memory_committedHeap_bytes gauge
base_memory_committedHeap_bytes 8.5262336E7
# HELP base_memory_maxHeap_bytes Displays the maximum amount of heap memory in bytes that can be used for memory management. This attribute displays -1 if the maximum heap memory size is undefined. This amount of memory is not guaranteed to be available for memory management if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size.
# TYPE base_memory_maxHeap_bytes gauge
base_memory_maxHeap_bytes 1.348141056E9
# HELP base_memory_usedHeap_bytes Displays the amount of used heap memory in bytes.
# TYPE base_memory_usedHeap_bytes gauge
base_memory_usedHeap_bytes 1.2666888E7
# HELP base_thread_count Displays the current number of live threads including both daemon and non-daemon threads
# TYPE base_thread_count gauge
base_thread_count 11.0
# HELP base_thread_daemon_count Displays the current number of live daemon threads.
# TYPE base_thread_daemon_count gauge
base_thread_daemon_count 7.0
# HELP base_thread_max_count Displays the peak live thread count since the Java virtual machine started or peak was reset. This includes daemon and non-daemon threads.
# TYPE base_thread_max_count gauge
base_thread_max_count 11.0
# HELP vendor_cpu_processCpuTime_seconds Displays the CPU time used by the process on which the Java virtual machine is running in nanoseconds. The returned value is of nanoseconds precision but not necessarily nanoseconds accuracy. This method returns -1 if the the platform does not support this operation.
# TYPE vendor_cpu_processCpuTime_seconds gauge
vendor_cpu_processCpuTime_seconds 4.36
# HELP vendor_cpu_systemCpuLoad_percent Displays the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values betweens 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
# TYPE vendor_cpu_systemCpuLoad_percent gauge
vendor_cpu_systemCpuLoad_percent 2.3565253563367224E-7
# HELP vendor_memory_committedNonHeap_bytes Displays the amount of non heap memory in bytes that is committed for the Java virtual machine to use.
# TYPE vendor_memory_committedNonHeap_bytes gauge
vendor_memory_committedNonHeap_bytes 5.1757056E7
# HELP vendor_memory_freePhysicalSize_bytes Displays the amount of free physical memory in bytes.
# TYPE vendor_memory_freePhysicalSize_bytes gauge
vendor_memory_freePhysicalSize_bytes 5.44448512E9
# HELP vendor_memory_freeSwapSize_bytes Displays the amount of free swap space in bytes.
# TYPE vendor_memory_freeSwapSize_bytes gauge
vendor_memory_freeSwapSize_bytes 0.0
# HELP vendor_memory_maxNonHeap_bytes Displays the maximum amount of used non-heap memory in bytes.
# TYPE vendor_memory_maxNonHeap_bytes gauge
vendor_memory_maxNonHeap_bytes -1.0
# HELP vendor_memory_usedNonHeap_bytes Displays the amount of used non-heap memory in bytes.
# TYPE vendor_memory_usedNonHeap_bytes gauge
vendor_memory_usedNonHeap_bytes 4.7445384E7
# HELP vendor_memoryPool_usage_bytes Current usage of the memory pool denoted by the 'name' tag
# TYPE vendor_memoryPool_usage_bytes gauge
vendor_memoryPool_usage_bytes{name="CodeHeap 'non-nmethods'"} 1357184.0
vendor_memoryPool_usage_bytes{name="CodeHeap 'non-profiled nmethods'"} 976128.0
vendor_memoryPool_usage_bytes{name="CodeHeap 'profiled nmethods'"} 4787200.0
vendor_memoryPool_usage_bytes{name="Compressed Class Space"} 4562592.0
vendor_memoryPool_usage_bytes{name="Eden Space"} 0.0
vendor_memoryPool_usage_bytes{name="Metaspace"} 3.5767632E7
vendor_memoryPool_usage_bytes{name="Survivor Space"} 0.0
vendor_memoryPool_usage_bytes{name="Tenured Gen"} 9872160.0
# HELP vendor_memoryPool_usage_max_bytes Peak usage of the memory pool denoted by the 'name' tag
# TYPE vendor_memoryPool_usage_max_bytes gauge
vendor_memoryPool_usage_max_bytes{name="CodeHeap 'non-nmethods'"} 1369600.0
vendor_memoryPool_usage_max_bytes{name="CodeHeap 'non-profiled nmethods'"} 976128.0
vendor_memoryPool_usage_max_bytes{name="CodeHeap 'profiled nmethods'"} 4793088.0
vendor_memoryPool_usage_max_bytes{name="Compressed Class Space"} 4562592.0
vendor_memoryPool_usage_max_bytes{name="Eden Space"} 2.3658496E7
vendor_memoryPool_usage_max_bytes{name="Metaspace"} 3.5769312E7
vendor_memoryPool_usage_max_bytes{name="Survivor Space"} 2883584.0
vendor_memoryPool_usage_max_bytes{name="Tenured Gen"} 9872160.0
So it seems metrics are working fine, but kubernetes returns this error:
Warning Unhealthy 24m (x9 over 28m) kubelet Liveness probe errored: strconv.Atoi: parsing "metrics": invalid syntax
Warning Unhealthy 4m2s (x70 over 28m) kubelet Readiness probe errored: strconv.Atoi: parsing "metrics": invalid syntax
Any help?
Thanks
First I needed to fix dockerfile.jvm
FROM openjdk:11
ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en'
# We make four distinct layers so if there are application changes the library layers can be re-used
# RUN ls -la target
COPY --chown=185 target/quarkus-app/lib/ /deployments/lib/
COPY --chown=185 target/quarkus-app/*.jar /deployments/
COPY --chown=185 target/quarkus-app/app/ /deployments/app/
COPY --chown=185 target/quarkus-app/quarkus/ /deployments/quarkus/
RUN java -version
EXPOSE 8080
USER root
ENV AB_JOLOKIA_OFF=""
ENV JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager"
ENV JAVA_DEBUG="true"
ENV JAVA_APP_JAR="/deployments/quarkus-run.jar"
CMD java ${JAVA_OPTS} -jar ${JAVA_APP_JAR}
this way jar started working. without that CMD openjdk image is just starting jshell. After that I saw the log below
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
2022-09-21 19:56:00,450 INFO [io.sma.health] (executor-thread-1) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"<default>":"Unable to execute the validation check for the default DataSource: Communications link failure\n\nThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server."}}]}
DB connection in kubernetes is not working.
deploy command: mvn clean package -DskipTests -Dquarkus.kubernetes.deploy=true
"minikube dashboard" looks like below
used the endpoints below
quarkus.smallrye-health.root-path=/health
quarkus.smallrye-health.liveness-path=/health/live
quarkus.smallrye-metrics.path=/metrics
and liveness url looks like below in the firefox
I needed to change some dependencies in pom because I use minikube in my local and needed to delete some java code because of db connection problems, you can find working example at https://github.com/ozkanpakdil/quarkus-examples/tree/master/liveness-readiness-kubernetes
you can see the definition yaml of the deployment below.
mintozzy#mintozzy-MACH-WX9:~$ kubectl get deployments.apps app-version-checker -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-09-21 - 20:29:23 +0000
app.quarkus.io/commit-id: 7d709651868d810cd9a906609c8edad3f9d796c0
deployment.kubernetes.io/revision: "3"
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scheme: http
prometheus.io/scrape: "true"
creationTimestamp: "2022-09-21T20:13:21Z"
generation: 3
labels:
app.kubernetes.io/name: app-version-checker
app.kubernetes.io/version: 1.0.0-SNAPSHOT
name: app-version-checker
namespace: default
resourceVersion: "117584"
uid: 758d420b-ed22-48f8-9d6f-150422a6b38e
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/name: app-version-checker
app.kubernetes.io/version: 1.0.0-SNAPSHOT
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-09-21 - 20:29:23 +0000
app.quarkus.io/commit-id: 7d709651868d810cd9a906609c8edad3f9d796c0
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scheme: http
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app.kubernetes.io/name: app-version-checker
app.kubernetes.io/version: 1.0.0-SNAPSHOT
spec:
containers:
- env:
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: mintozzy/app-version-checker:1.0.0-SNAPSHOT
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /health/live
port: 8080
scheme: HTTP
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
name: app-version-checker
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health/ready
port: 8080
scheme: HTTP
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2022-09-21T20:13:21Z"
lastUpdateTime: "2022-09-21T20:30:03Z"
message: ReplicaSet "app-version-checker-5cb974f465" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2022-09-22T16:09:48Z"
lastUpdateTime: "2022-09-22T16:09:48Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 3
readyReplicas: 1
replicas: 1
updatedReplicas: 1

need insight with logstash and elk

I am trying to add in data from logstash to elastic search bellow is the config file and the result on the terminal
input {
file {
path => "F:/business/data/clinical_trials/ctp.csv"
start_position => "beginning"
sincedb_path => "C:/Users/matth/Downloads/logstash-7.14.2-windows-x86_641/logstash-
7.14.2/data/plugins/inputs/file/.sincedb_88142d557695dc3df93b28d02940763d"
}
}
filter {
csv {
separator => ","
columns => ["web-scraper-order", "web-scraper-start-url", "First Submitted Date"]
}
}
output {
stdout { codec => "rubydebug"}
elasticsearch {
hosts => ["https://idm.es.eastus2.azure.elastic-cloud.com:9243"]
index => "DB"
user => "elastic"
password => "******"
}
}
and when i run i get the following it gets right down to the piplines and the next step should be establishing connection with elastic but it hangs on pipeline forever.
OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
C:/Users/matth/Downloads/logstash-7.14.2-windows-x86_641/logstash-7.14.2/vendor/bundle/jruby/2.5.0/gems/bundler-1.17.3/lib/bundler/rubygems_integration.rb:200: warning: constant Gem::ConfigMap is deprecated
Sending Logstash logs to C:/Users/matth/Downloads/logstash-7.14.2-windows-x86_641/logstash-7.14.2/logs which is now configured via log4j2.properties
[2021-09-22T22:40:20,327][INFO ][logstash.runner ] Log4j configuration path used is: C:\Users\matth\Downloads\logstash-7.14.2-windows-x86_641\logstash-7.14.2\config\log4j2.properties
[2021-09-22T22:40:20,333][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"7.14.2", "jruby.version"=>"jruby 9.2.19.0 (2.5.8) 2021-06-15 55810c552b OpenJDK 64-Bit Server VM 11.0.12+7 on 11.0.12+7 +indy +jit [mswin32-x86_64]"}
[2021-09-22T22:40:20,383][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2021-09-22T22:40:21,625][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
[2021-09-22T22:40:22,007][INFO ][org.reflections.Reflections] Reflections took 46 ms to scan 1 urls, producing 120 keys and 417 values
[2021-09-22T22:40:22,865][INFO ][logstash.outputs.elasticsearch][main] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["https://idm.es.eastus2.azure.elastic-cloud.com:9243"]}
[2021-09-22T22:40:23,032][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[https://elastic:xxxxxx#idm.es.eastus2.azure.elastic-cloud.com:9243/]}}
[2021-09-22T22:40:23,707][WARN ][logstash.outputs.elasticsearch][main] Restored connection to ES instance {:url=>"https://elastic:xxxxxx#idm.es.eastus2.azure.elastic-cloud.com:9243/"}
[2021-09-22T22:40:24,094][INFO ][logstash.outputs.elasticsearch][main] Elasticsearch version determined (7.14.1) {:es_version=>7}
[2021-09-22T22:40:24,096][WARN ][logstash.outputs.elasticsearch][main] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>7}
[2021-09-22T22:40:24,277][INFO ][logstash.javapipeline ][main] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>12, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>1500, "pipeline.sources"=>["C:/Users/matth/Downloads/logstash-7.14.2-windows-x86_641/logstash-7.14.2/config/logstash-sample.conf"], :thread=>"#<Thread:0x52ff5a7a run>"}
[2021-09-22T22:40:24,346][INFO ][logstash.outputs.elasticsearch][main] Using a default mapping template {:es_version=>7, :ecs_compatibility=>:disabled}
[2021-09-22T22:40:24,793][INFO ][logstash.javapipeline ][main] Pipeline Java execution initialization time {"seconds"=>0.51}
[2021-09-22T22:40:24,835][INFO ][logstash.javapipeline ][main] Pipeline started {"pipeline.id"=>"main"}
[2021-09-22T22:40:24,856][INFO ][filewatch.observingtail ][main][c4df97cc309d579dc89a5115b7fcf43a440ef876f640039f520f6b116b913f49] START, creating Discoverer, Watch with file and sincedb collections
[2021-09-22T22:40:24,876][INFO ][logstash.agent ] Pipelines running {:count=>1, `:running_pipelines=>[:main], :non_running_pipelines=>[]}
any help as to why my data isnt getting to elastic??? please help

Sunspot Solr on Ruby: "Exception writing document id to the index, java.lang.OutOfMemoryError"

We have a Ruby on Rails application running for some time now, but since a couple of days we're just getting a HTTP500 Error while trying to update any documents. The system is running Ruby 1.9.3 with unicorn 4.8.2 and nginx 1.2.1 and a sunspot_solr 2.1.1 gem for the search index (besides some others).
The Production log tells me that every time someone tries to create or update a document the solr server throws a bunch of errors. The most prominent I think is this one:
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)\n\t... 30 more\nCaused by: java.lang.OutOfMemoryError: Java heap space\n\
Unfortunately nothing changes if i edit config/sunstpot.yml to include "max_memory: 1G":
production:
solr:
hostname: localhost
port: 8080
log_level: WARNING
min_memory: 32M
max_memory: 1G
path: /solr-4.10.2/default #production #ollection1 #production
Here is the complete production.log entry:
[a086bc878daa0abd367379daade96682] Completed 500 Internal Server Error in 195ms
[a086bc878daa0abd367379daade96682]
RSolr::Error::Http (RSolr::Error::Http - 500 Internal Server Error
Error: {"responseHeader":{"status":500,"QTime":2},"error":{"msg":"Exception writing document id Neuigkeit 393 to the index; possible analysis error.","trace":"org.apache.solr.common.SolrException: Exception writing document id Neuigkeit 393 to the index; possible analysis error.\n\
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:168)\n\
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)\n\tat
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)\n\tat
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:926)\n\tat
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1080)\n\tat
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:692)\n\tat
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)\n\
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)\n\tat
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)\n\
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99)\n\tat
org.apache.solr.haa086bc878daa0abd367379daade96682] Completed 500 Internal Server Error in 195ms
[a086bc878daa0abd367379daade96682]
RSolr::Error::Http (RSolr::Error::Http - 500 Internal Server Error
Error: {"responseHeader":{"status":500,"QTime":2},"error":{"msg":"Exception writing document id Neuigkeit 393 to the index; possible analysis error.","trace":"org.apache.solr.common.SolrException: Exception writing document indler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)\n\
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\
org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\n\
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\n\
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)\n\
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)\n\
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)\n\
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)\n\
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)\n\
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)\n\
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)\n\
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)\n\
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)\n\
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1003)\n\
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)\n\
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)\n\
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n\
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n\
java.lang.Thread.run(Thread.java:745)\nCaused by: org.apache.lucene.store.AlreadyClosedException: this IndexWriter is closed\n\
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:698)\n\
org.apache.lucene.index.IndexWriter.ensureOpen(IndexWriter.java:712)\n\
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1507)\n\
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)\n\
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)\n\t... 30 more\nCaused by: java.lang.OutOfMemoryError: Java heap space\n\
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.<init>(FreqProxTermsWriterPerField.java:207)\n\
org.apache.lucene.index.FreqProxTermsWriterPerField$FreqProxPostingsArray.newInstance(FreqProxTermsWriterPerField.java:230)\n\
org.apache.lucene.index.ParallelPostingsArray.grow(ParallelPostingsArray.java:48)\n\
org.apache.lucene.index.TermsHashPerField$PostingsBytesStartArray.grow(TermsHashPerField.java:252)\n\
org.apache.lucene.util.BytesRefHash.add(BytesRefHash.java:292)\n\
org.apache.lucene.index.TermsHashPerField.add(TermsHashPerField.java:151)\n\
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:659)\n\
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:359)\n\
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:318)\n\
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:239)\n\
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:454)\n\
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1511)\n\
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:240)\n\
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:164)\n\
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)\n\
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)\n\
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:926)\n\
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1080)\n\
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:692)\n\
org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)\n\
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:247)\n\
org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:174)\n\
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:99)\n\
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)\n\
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)\n\
org.apache.solr.core.SolrCore.execute(SolrCore.java:1967)\n\
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)\n\
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)\n\
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)\n\
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)\n\
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)\n\
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)\n","code":500}}
URI: http://localhost:8080/solr-4.10.2/default/update?wt=json
EDIT: I'm an idiot. The solr server isn't contained in the unicorn service. Restarting it via 'service tomcat7 restart' loaded the updated config/sunstpot.yml and solved the problem.

Neo4j: "Failed to rotate logs exception" at shutdown

I have a neo4j 3.2.1 multi-labeled multi-properties graph database which has 4M nodes, 15M edges, and 4.8M distinct labels with ~6GB size on the disk.
I've imported the dataset using "neo4j-import" tool using a linux machine.
I can open the dataset, traverse the nodes, edges, and their descriptions well using the Java API. However, once I want to shut it down, it takes a lot of time and finally, it gives me the following log file error:
2017-08-04 07:07:38.189+0000 INFO [o.n.k.i.f.GraphDatabaseFacadeFactory] Shutdown started
2017-08-04 07:07:38.190+0000 INFO [o.n.k.i.f.GraphDatabaseFacadeFactory] Database is now unavailable
2017-08-04 07:07:38.198+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by database shutdown [5399]: Starting check pointing...
2017-08-04 07:07:38.198+0000 INFO [o.n.k.i.t.l.c.CheckPointerImpl] Check Pointing triggered by database shutdown [5399]: Starting store flush...
2017-08-04 07:23:35.022+0000 ERROR [o.n.k.i.t.l.c.CheckPointerImpl] Error performing check point Failed to rotate logs. Expected version: 5399, actual version: 5274, wait timeout (ms): 956815
org.neo4j.kernel.impl.store.kvstore.RotationTimeoutException: Failed to rotate logs. Expected version: 5399, actual version: 5274, wait timeout (ms): 956815
at org.neo4j.kernel.impl.store.kvstore.RotationState$Rotation.rotate(RotationState.java:79)
at org.neo4j.kernel.impl.store.kvstore.RotationState$Rotation.rotate(RotationState.java:52)
at org.neo4j.kernel.impl.store.kvstore.AbstractKeyValueStore$RotationTask.rotate(AbstractKeyValueStore.java:311)
at org.neo4j.kernel.impl.store.kvstore.AbstractKeyValueStore$RotationTask.rotate(AbstractKeyValueStore.java:288)
at org.neo4j.kernel.impl.store.counts.CountsTracker.rotate(CountsTracker.java:154)
at org.neo4j.kernel.impl.store.NeoStores.flush(NeoStores.java:242)
at org.neo4j.kernel.impl.storageengine.impl.recordstorage.RecordStorageEngine.flushAndForce(RecordStorageEngine.java:480)
at org.neo4j.kernel.impl.transaction.log.checkpoint.CheckPointerImpl.doCheckPoint(CheckPointerImpl.java:160)
at org.neo4j.kernel.impl.transaction.log.checkpoint.CheckPointerImpl.forceCheckPoint(CheckPointerImpl.java:88)
at org.neo4j.kernel.NeoStoreDataSource$3.shutdown(NeoStoreDataSource.java:794)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.shutdown(LifeSupport.java:489)
at org.neo4j.kernel.lifecycle.LifeSupport.shutdown(LifeSupport.java:206)
at org.neo4j.kernel.NeoStoreDataSource.stop(NeoStoreDataSource.java:766)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.stop(LifeSupport.java:458)
at org.neo4j.kernel.lifecycle.LifeSupport.stopInstances(LifeSupport.java:161)
at org.neo4j.kernel.lifecycle.LifeSupport.stop(LifeSupport.java:143)
at org.neo4j.kernel.impl.transaction.state.DataSourceManager.stop(DataSourceManager.java:120)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.stop(LifeSupport.java:458)
at org.neo4j.kernel.lifecycle.LifeSupport.stopInstances(LifeSupport.java:161)
at org.neo4j.kernel.lifecycle.LifeSupport.stop(LifeSupport.java:143)
at org.neo4j.kernel.lifecycle.LifeSupport.shutdown(LifeSupport.java:191)
at org.neo4j.kernel.impl.factory.ClassicCoreSPI.shutdown(ClassicCoreSPI.java:159)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacade.shutdown(GraphDatabaseFacade.java:366)
at experiment.caseStudy.TestDatasetHealth.run(TestDatasetHealth.java:70)
at experiment.caseStudy.TestDatasetHealth.main(TestDatasetHealth.java:29)
2017-08-04 07:23:35.665+0000 INFO [o.n.k.i.DiagnosticsManager] --- STOPPING diagnostics START ---
2017-08-04 07:23:35.666+0000 INFO [o.n.k.i.DiagnosticsManager] --- STOPPING diagnostics END ---
In the Java itself, I get the following exception:
Exception in thread "main" org.neo4j.kernel.lifecycle.LifecycleException: Component 'org.neo4j.kernel.NeoStoreDataSource$3#3101ffd3' failed to transition from stopped to shutting_down. Please see the attached cause exception "Failed to rotate logs. Expected version: 5399, actual version: 5274, wait timeout (ms): 956815".
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.shutdown(LifeSupport.java:497)
at org.neo4j.kernel.lifecycle.LifeSupport.shutdown(LifeSupport.java:206)
at org.neo4j.kernel.NeoStoreDataSource.stop(NeoStoreDataSource.java:766)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.stop(LifeSupport.java:458)
at org.neo4j.kernel.lifecycle.LifeSupport.stopInstances(LifeSupport.java:161)
at org.neo4j.kernel.lifecycle.LifeSupport.stop(LifeSupport.java:143)
at org.neo4j.kernel.impl.transaction.state.DataSourceManager.stop(DataSourceManager.java:120)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.stop(LifeSupport.java:458)
at org.neo4j.kernel.lifecycle.LifeSupport.stopInstances(LifeSupport.java:161)
at org.neo4j.kernel.lifecycle.LifeSupport.stop(LifeSupport.java:143)
at org.neo4j.kernel.lifecycle.LifeSupport.shutdown(LifeSupport.java:191)
at org.neo4j.kernel.impl.factory.ClassicCoreSPI.shutdown(ClassicCoreSPI.java:159)
at org.neo4j.kernel.impl.factory.GraphDatabaseFacade.shutdown(GraphDatabaseFacade.java:366)
at experiment.caseStudy.TestDatasetHealth.run(TestDatasetHealth.java:70)
at experiment.caseStudy.TestDatasetHealth.main(TestDatasetHealth.java:29)
Caused by: org.neo4j.kernel.impl.store.kvstore.RotationTimeoutException: Failed to rotate logs. Expected version: 5399, actual version: 5274, wait timeout (ms): 956815
at org.neo4j.kernel.impl.store.kvstore.RotationState$Rotation.rotate(RotationState.java:79)
at org.neo4j.kernel.impl.store.kvstore.RotationState$Rotation.rotate(RotationState.java:52)
at org.neo4j.kernel.impl.store.kvstore.AbstractKeyValueStore$RotationTask.rotate(AbstractKeyValueStore.java:311)
at org.neo4j.kernel.impl.store.kvstore.AbstractKeyValueStore$RotationTask.rotate(AbstractKeyValueStore.java:288)
at org.neo4j.kernel.impl.store.counts.CountsTracker.rotate(CountsTracker.java:154)
at org.neo4j.kernel.impl.store.NeoStores.flush(NeoStores.java:242)
at org.neo4j.kernel.impl.storageengine.impl.recordstorage.RecordStorageEngine.flushAndForce(RecordStorageEngine.java:480)
at org.neo4j.kernel.impl.transaction.log.checkpoint.CheckPointerImpl.doCheckPoint(CheckPointerImpl.java:160)
at org.neo4j.kernel.impl.transaction.log.checkpoint.CheckPointerImpl.forceCheckPoint(CheckPointerImpl.java:88)
at org.neo4j.kernel.NeoStoreDataSource$3.shutdown(NeoStoreDataSource.java:794)
at org.neo4j.kernel.lifecycle.LifeSupport$LifecycleInstance.shutdown(LifeSupport.java:489)
In fact, in the Java program, I just read the information and do not write anything on the dataset.
Furthermore, to open the database using the following line of code, it takes 80 seconds on a 3.1GHz Core i7 MacBook with 16GB of Ram with 10GB of JVM arguments.
Is it normal to take this much of time for a dataset with the mentioned size?
GraphDatabaseService dataGraph = new GraphDatabaseFactory().newEmbeddedDatabase(storeDir);
Could you please guide me how can I repair the dataset to be easily shut down?

Building a storm topology with shell bolts

I'm currently trying to implement a Storm topology that integrates with the R language.
As a starting point, i took the following project (https://github.com/allenday/R-Storm) which works by extending the ShellBolt class to implement R integration, as well as an R library to handle communication between the java and R sides.
My problem is that if i create a topology based on regular (java-only) bolts, i can chain them together without issue. However, when one of the bolts in the middle of the chain is an R Shell Bolt, the thing just falls apart with:
5661 [Thread-18] ERROR backtype.storm.util - Async loop died!
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException: Pipe to subprocess seems to be broken! No output read.
Shell Process Exception:
at backtype.storm.utils.DisruptorQueue.consumeBatchToCursor(DisruptorQueue.java:87) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.utils.DisruptorQueue.consumeBatchWhenAvailable(DisruptorQueue.java:58) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.disruptor$consume_batch_when_available.invoke(disruptor.clj:62) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.daemon.executor$fn__3557$fn__3569$fn__3616.invoke(executor.clj:715) ~[storm-0.9.0-wip16.jar:na]
at backtype.storm.util$async_loop$fn__436.invoke(util.clj:377) ~[storm-0.9.0-wip16.jar:na]
at clojure.lang.AFn.run(AFn.java:24) ~[clojure-1.4.0.jar:na]
at java.lang.Thread.run(Unknown Source) ~[na:1.7.0_25]
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: Pipe to subprocess seems to be broken! No output read.
More concrete, the following topology works as expected:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("spout", new RandomSentenceSpout(), 1);
builder.setBolt("permutebolt", new PermuteBolt(), 1).shuffleGrouping("spout");
Where PermuteBolt is an R Shell Bolt. The logs for this example show the expected output:
6246 [Thread-18] INFO backtype.storm.daemon.task - Emitting: spout default [four score and seven years ago]
6246 [Thread-16] INFO backtype.storm.daemon.executor - Processing received message source: spout:3, stream: default, id: {}, [four score and seven years ago]
6261 [Thread-23] INFO backtype.storm.daemon.task - Emitting: permutebolt default ["PERMUTE seven years ago and four score"]
If, however i add another bolt that gets its data from the first one, such as:
builder.setBolt("permutebolt", new PermuteBolt(), 1).shuffleGrouping("spout");
builder.setBolt("identity", new IdentityBolt(new Fields("identity")), 1).fieldsGrouping("permutebolt", new Fields("permutation"));
It fails with the trace printed above. Also, what's weird is that this second example which is failing is included with the project.
Is this an issue anyone has faced before ?
UPDATE: I noticed this only occurs when using R Shell Bolts, i have since tried launching bolts that use python scripts and have been able to chain them normally.
#andrei, this is fixed in 1.01 uploaded to github today:
https://github.com/allenday/R-Storm/releases/tag/v1.01
It has been submitted to CRAN and will be available soon.
Thanks for reporting.
-Allen

Categories

Resources