faile in using jcstress due to tests can't be found - java

I try to use jcstress to do some test in IDEA
and here is the configuration in IDEA
the code is not the point, but i will post it anyway
#JCStressTest
#Outcome(id={"1","4"},expect = Expect.ACCEPTABLE,desc="ok")
#Outcome(id="0",expect = Expect.ACCEPTABLE_INTERESTING,desc = "danger")
#State
public class OrderingTest01 {
int num = 0;
boolean ready = false;
#Actor
public void actor1(IJ_Result r){
if(ready){
r.r1 = num + num;
}else{
r.r1 =1;
}
}
#Actor
public void actor2(IJ_Result r){
num =2;
ready = true;
}
}
I got following result when i run it
Java Concurrency Stress Tests
---------------------------------------------------------------------------------
Rev: a0cc333318f0a3e5, built by shade with 11.0.8-testing at 2020-10-14T09:49:41Z
Burning up to figure out the exact CPU count....... done!
Probing the target OS:
(all failures are non-fatal, but may affect testing accuracy)
----- [FAILED] Trying to set affinity with taskset
Cannot run program "taskset": error=2, No such file or directory
Initializing and probing the target VM:
(all failures are non-fatal, but may affect testing accuracy)
----- [OK] Unlocking diagnostic VM options
----- [OK] Trimming down the default VM heap size to 1/16-th of max RAM
----- [OK] Trimming down the number of compiler threads
----- [OK] Trimming down the number of parallel GC threads
----- [OK] Trimming down the number of concurrent GC threads
----- [OK] Trimming down the number of G1 concurrent refinement GC threads
----- [OK] Testing #Contended works on all results and infra objects
----- [OK] Unlocking Whitebox API for online de-optimization: all methods
----- [OK] Unlocking Whitebox API for online de-optimization: selected methods
----- [FAILED] Unlocking C2 local code motion randomizer
Error: VM option 'StressLCM' is develop and is available only in debug version of VM.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
----- [FAILED] Unlocking C2 global code motion randomizer
Error: VM option 'StressGCM' is develop and is available only in debug version of VM.
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
----- [FAILED] Unlocking C2 iterative global value numbering randomizer
Unrecognized VM option 'StressIGVN'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
----- [OK] Testing allocation profiling
----- [FAILED] Trying Thread.onSpinWait
Exception in thread "main" java.lang.NoSuchMethodError: java.lang.Thread.onSpinWait()V
at org.openjdk.jcstress.vm.ThreadSpinWaitTestMain.main(ThreadSpinWaitTestMain.java:30)
Probing what VM modes are available:
(failures are non-fatal, but may miss some interesting cases)
----- [OK] [-Xint]
----- [OK] [-XX:TieredStopAtLevel=1]
----- [OK] [-XX:-TieredCompilation]
----- [OK] [-XX:-TieredCompilation]
Hardware CPUs in use: 16, using plain hard busywait.
Test preset mode: "default"
Writing the test results to "jcstress-results-2022-09-25-19-43-58.bin.gz"
Parsing results to "results/"
Running each test matching "OrderingTest01" for 1 forks, 5 iterations, 1000 ms each
Each JVM would execute at most 5 tests in the row.
Solo stride size will be autobalanced within [10, 10000] elements, but taking no more than 100 Mb.
FATAL: No matching tests.
Shouldn't it be like giving some Run Results and a file named index.html?
What's the mean of FATAL: No matching tests.???
Or is there something wrong with the configuration???

Related

Readiness and liveness failed with smallrye metrics in kubernetes

I'm deploying a pod written in quarkus in kubernetes and the startup seems to go fine. But there's a problem with readiness and liveness that result unhealthy.
For metrics I'm using smallrye metrics configured on port 8080 and on path:
quarkus.smallrye-metrics.path=/metrics
If i enter in the pod and i execute
curl localhost:8080/metrics
the response is
# HELP base_classloader_loadedClasses_count Displays the number of classes that are currently loaded in the Java virtual machine.
# TYPE base_classloader_loadedClasses_count gauge
base_classloader_loadedClasses_count 7399.0
# HELP base_classloader_loadedClasses_total Displays the total number of classes that have been loaded since the Java virtual machine has started execution.
# TYPE base_classloader_loadedClasses_total counter
base_classloader_loadedClasses_total 7403.0
# HELP base_classloader_unloadedClasses_total Displays the total number of classes unloaded since the Java virtual machine has started execution.
# TYPE base_classloader_unloadedClasses_total counter
base_classloader_unloadedClasses_total 4.0
# HELP base_cpu_availableProcessors Displays the number of processors available to the Java virtual machine. This value may change during a particular invocation of the virtual machine.
# TYPE base_cpu_availableProcessors gauge
base_cpu_availableProcessors 1.0
# HELP base_cpu_processCpuLoad_percent Displays the "recent cpu usage" for the Java Virtual Machine process. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that none of the CPUs were running threads from the JVM process during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running threads from the JVM 100% of the time during the recent period being observed. Threads from the JVM include the application threads as well as the JVM internal threads. All values between 0.0 and 1.0 are possible depending of the activities going on in the JVM process and the whole system. If the Java Virtual Machine recent CPU usage is not available, the method returns a negative value.
# TYPE base_cpu_processCpuLoad_percent gauge
base_cpu_processCpuLoad_percent 2.3218608761411404E-7
# HELP base_cpu_systemLoadAverage Displays the system load average for the last minute. The system load average is the sum of the number of runnable entities queued to the available processors and the number of runnable entities running on the available processors averaged over a period of time. The way in which the load average is calculated is operating system specific but is typically a damped time-dependent average. If the load average is not available, a negative value is displayed. This attribute is designed to provide a hint about the system load and may be queried frequently. The load average may be unavailable on some platforms where it is expensive to implement this method.
# TYPE base_cpu_systemLoadAverage gauge
base_cpu_systemLoadAverage 0.15
# HELP base_gc_time_total Displays the approximate accumulated collection elapsed time in milliseconds. This attribute displays -1 if the collection elapsed time is undefined for this collector. The Java virtual machine implementation may use a high resolution timer to measure the elapsed time. This attribute may display the same value even if the collection count has been incremented if the collection elapsed time is very short.
# TYPE base_gc_time_total counter
base_gc_time_total_seconds{name="Copy"} 0.032
base_gc_time_total_seconds{name="MarkSweepCompact"} 0.071
# HELP base_gc_total Displays the total number of collections that have occurred. This attribute lists -1 if the collection count is undefined for this collector.
# TYPE base_gc_total counter
base_gc_total{name="Copy"} 4.0
base_gc_total{name="MarkSweepCompact"} 2.0
# HELP base_jvm_uptime_seconds Displays the time from the start of the Java virtual machine in milliseconds.
# TYPE base_jvm_uptime_seconds gauge
base_jvm_uptime_seconds 624.763
# HELP base_memory_committedHeap_bytes Displays the amount of memory in bytes that is committed for the Java virtual machine to use. This amount of memory is guaranteed for the Java virtual machine to use.
# TYPE base_memory_committedHeap_bytes gauge
base_memory_committedHeap_bytes 8.5262336E7
# HELP base_memory_maxHeap_bytes Displays the maximum amount of heap memory in bytes that can be used for memory management. This attribute displays -1 if the maximum heap memory size is undefined. This amount of memory is not guaranteed to be available for memory management if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size.
# TYPE base_memory_maxHeap_bytes gauge
base_memory_maxHeap_bytes 1.348141056E9
# HELP base_memory_usedHeap_bytes Displays the amount of used heap memory in bytes.
# TYPE base_memory_usedHeap_bytes gauge
base_memory_usedHeap_bytes 1.2666888E7
# HELP base_thread_count Displays the current number of live threads including both daemon and non-daemon threads
# TYPE base_thread_count gauge
base_thread_count 11.0
# HELP base_thread_daemon_count Displays the current number of live daemon threads.
# TYPE base_thread_daemon_count gauge
base_thread_daemon_count 7.0
# HELP base_thread_max_count Displays the peak live thread count since the Java virtual machine started or peak was reset. This includes daemon and non-daemon threads.
# TYPE base_thread_max_count gauge
base_thread_max_count 11.0
# HELP vendor_cpu_processCpuTime_seconds Displays the CPU time used by the process on which the Java virtual machine is running in nanoseconds. The returned value is of nanoseconds precision but not necessarily nanoseconds accuracy. This method returns -1 if the the platform does not support this operation.
# TYPE vendor_cpu_processCpuTime_seconds gauge
vendor_cpu_processCpuTime_seconds 4.36
# HELP vendor_cpu_systemCpuLoad_percent Displays the "recent cpu usage" for the whole system. This value is a double in the [0.0,1.0] interval. A value of 0.0 means that all CPUs were idle during the recent period of time observed, while a value of 1.0 means that all CPUs were actively running 100% of the time during the recent period being observed. All values betweens 0.0 and 1.0 are possible depending of the activities going on in the system. If the system recent cpu usage is not available, the method returns a negative value.
# TYPE vendor_cpu_systemCpuLoad_percent gauge
vendor_cpu_systemCpuLoad_percent 2.3565253563367224E-7
# HELP vendor_memory_committedNonHeap_bytes Displays the amount of non heap memory in bytes that is committed for the Java virtual machine to use.
# TYPE vendor_memory_committedNonHeap_bytes gauge
vendor_memory_committedNonHeap_bytes 5.1757056E7
# HELP vendor_memory_freePhysicalSize_bytes Displays the amount of free physical memory in bytes.
# TYPE vendor_memory_freePhysicalSize_bytes gauge
vendor_memory_freePhysicalSize_bytes 5.44448512E9
# HELP vendor_memory_freeSwapSize_bytes Displays the amount of free swap space in bytes.
# TYPE vendor_memory_freeSwapSize_bytes gauge
vendor_memory_freeSwapSize_bytes 0.0
# HELP vendor_memory_maxNonHeap_bytes Displays the maximum amount of used non-heap memory in bytes.
# TYPE vendor_memory_maxNonHeap_bytes gauge
vendor_memory_maxNonHeap_bytes -1.0
# HELP vendor_memory_usedNonHeap_bytes Displays the amount of used non-heap memory in bytes.
# TYPE vendor_memory_usedNonHeap_bytes gauge
vendor_memory_usedNonHeap_bytes 4.7445384E7
# HELP vendor_memoryPool_usage_bytes Current usage of the memory pool denoted by the 'name' tag
# TYPE vendor_memoryPool_usage_bytes gauge
vendor_memoryPool_usage_bytes{name="CodeHeap 'non-nmethods'"} 1357184.0
vendor_memoryPool_usage_bytes{name="CodeHeap 'non-profiled nmethods'"} 976128.0
vendor_memoryPool_usage_bytes{name="CodeHeap 'profiled nmethods'"} 4787200.0
vendor_memoryPool_usage_bytes{name="Compressed Class Space"} 4562592.0
vendor_memoryPool_usage_bytes{name="Eden Space"} 0.0
vendor_memoryPool_usage_bytes{name="Metaspace"} 3.5767632E7
vendor_memoryPool_usage_bytes{name="Survivor Space"} 0.0
vendor_memoryPool_usage_bytes{name="Tenured Gen"} 9872160.0
# HELP vendor_memoryPool_usage_max_bytes Peak usage of the memory pool denoted by the 'name' tag
# TYPE vendor_memoryPool_usage_max_bytes gauge
vendor_memoryPool_usage_max_bytes{name="CodeHeap 'non-nmethods'"} 1369600.0
vendor_memoryPool_usage_max_bytes{name="CodeHeap 'non-profiled nmethods'"} 976128.0
vendor_memoryPool_usage_max_bytes{name="CodeHeap 'profiled nmethods'"} 4793088.0
vendor_memoryPool_usage_max_bytes{name="Compressed Class Space"} 4562592.0
vendor_memoryPool_usage_max_bytes{name="Eden Space"} 2.3658496E7
vendor_memoryPool_usage_max_bytes{name="Metaspace"} 3.5769312E7
vendor_memoryPool_usage_max_bytes{name="Survivor Space"} 2883584.0
vendor_memoryPool_usage_max_bytes{name="Tenured Gen"} 9872160.0
So it seems metrics are working fine, but kubernetes returns this error:
Warning Unhealthy 24m (x9 over 28m) kubelet Liveness probe errored: strconv.Atoi: parsing "metrics": invalid syntax
Warning Unhealthy 4m2s (x70 over 28m) kubelet Readiness probe errored: strconv.Atoi: parsing "metrics": invalid syntax
Any help?
Thanks
First I needed to fix dockerfile.jvm
FROM openjdk:11
ENV LANG='en_US.UTF-8' LANGUAGE='en_US:en'
# We make four distinct layers so if there are application changes the library layers can be re-used
# RUN ls -la target
COPY --chown=185 target/quarkus-app/lib/ /deployments/lib/
COPY --chown=185 target/quarkus-app/*.jar /deployments/
COPY --chown=185 target/quarkus-app/app/ /deployments/app/
COPY --chown=185 target/quarkus-app/quarkus/ /deployments/quarkus/
RUN java -version
EXPOSE 8080
USER root
ENV AB_JOLOKIA_OFF=""
ENV JAVA_OPTS="-Dquarkus.http.host=0.0.0.0 -Djava.util.logging.manager=org.jboss.logmanager.LogManager"
ENV JAVA_DEBUG="true"
ENV JAVA_APP_JAR="/deployments/quarkus-run.jar"
CMD java ${JAVA_OPTS} -jar ${JAVA_APP_JAR}
this way jar started working. without that CMD openjdk image is just starting jshell. After that I saw the log below
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server.
2022-09-21 19:56:00,450 INFO [io.sma.health] (executor-thread-1) SRHCK01001: Reporting health down status: {"status":"DOWN","checks":[{"name":"Database connections health check","status":"DOWN","data":{"<default>":"Unable to execute the validation check for the default DataSource: Communications link failure\n\nThe last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server."}}]}
DB connection in kubernetes is not working.
deploy command: mvn clean package -DskipTests -Dquarkus.kubernetes.deploy=true
"minikube dashboard" looks like below
used the endpoints below
quarkus.smallrye-health.root-path=/health
quarkus.smallrye-health.liveness-path=/health/live
quarkus.smallrye-metrics.path=/metrics
and liveness url looks like below in the firefox
I needed to change some dependencies in pom because I use minikube in my local and needed to delete some java code because of db connection problems, you can find working example at https://github.com/ozkanpakdil/quarkus-examples/tree/master/liveness-readiness-kubernetes
you can see the definition yaml of the deployment below.
mintozzy#mintozzy-MACH-WX9:~$ kubectl get deployments.apps app-version-checker -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-09-21 - 20:29:23 +0000
app.quarkus.io/commit-id: 7d709651868d810cd9a906609c8edad3f9d796c0
deployment.kubernetes.io/revision: "3"
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scheme: http
prometheus.io/scrape: "true"
creationTimestamp: "2022-09-21T20:13:21Z"
generation: 3
labels:
app.kubernetes.io/name: app-version-checker
app.kubernetes.io/version: 1.0.0-SNAPSHOT
name: app-version-checker
namespace: default
resourceVersion: "117584"
uid: 758d420b-ed22-48f8-9d6f-150422a6b38e
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app.kubernetes.io/name: app-version-checker
app.kubernetes.io/version: 1.0.0-SNAPSHOT
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
app.quarkus.io/build-timestamp: 2022-09-21 - 20:29:23 +0000
app.quarkus.io/commit-id: 7d709651868d810cd9a906609c8edad3f9d796c0
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scheme: http
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app.kubernetes.io/name: app-version-checker
app.kubernetes.io/version: 1.0.0-SNAPSHOT
spec:
containers:
- env:
- name: KUBERNETES_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
image: mintozzy/app-version-checker:1.0.0-SNAPSHOT
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 3
httpGet:
path: /health/live
port: 8080
scheme: HTTP
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
name: app-version-checker
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health/ready
port: 8080
scheme: HTTP
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 10
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2022-09-21T20:13:21Z"
lastUpdateTime: "2022-09-21T20:30:03Z"
message: ReplicaSet "app-version-checker-5cb974f465" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
- lastTransitionTime: "2022-09-22T16:09:48Z"
lastUpdateTime: "2022-09-22T16:09:48Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
observedGeneration: 3
readyReplicas: 1
replicas: 1
updatedReplicas: 1

Is there any difference between running multithreaded program with and without debugger? Apart from obvious ones

I'm doing a little project that tries to compare (simulate) Bluetooth Legacy Advertisement with Extended Advertisement. So I have two threads (one per device) and they are doing their tasks. I run the program normally and it's not working (kinda expected that) but when I run it with debugger it works. I have only one breakpoint and it's after any meaningful operations. So that is when my question arises. Is there any difference between running it with or without debugger.
I'm using IntelliJ IDEA (newest version) and Java 1.8
Output when running normally (stopped by me):
343
Process finished with exit code -1
Output with debugger:
Connected to the target VM, address: '127.0.0.1:58359', transport: 'socket'
OpenJDK 64-Bit Server VM warning: Sharing is only supported for boot loader classes because bootstrap classpath has been appended
343
21 .. -39
[21, ... -39]
DONE
Disconnected from the target VM, address: '127.0.0.1:58359', transport: 'socket'
Process finished with exit code 0
Code fragment with breakpoint
void secondaryListen() {
if(!Simulation.World.channels[this.receivedAdvertisement.channel].empty) {
SecondaryMessage currentMessage = (SecondaryMessage) Simulation.World.channels[this.receivedAdvertisement.channel].getPayload();
for (byte b : currentMessage.content) {
System.out.print(b + " ");
this.receivedData.add(b);
}
System.out.println();
this.mode = Mode.SCAN;
if(currentMessage.lastMessage) {
this.mode = Mode.FINISHED; //BREAKPOINT ON THAT LINE
System.out.println(this.receivedData.toString());
}
}
}
Thanks for all the answers in advance!

Slow application, frequent JVM hangs with single-CPU setups and Java 12+

We have a client application (with 10+ years of development). Its JDK was upgraded from OpenJDK 11 to OpenJDK 14 recently. On single-CPU (hyper-threading disabled) Windows 10 setups (and inside VirtualBox machines with only one available CPU) the application starts quite slowly compared to Java 11. Furthermore, it uses 100% CPU most of the time. We could also reproduce the issue with setting the processor affinity to only one CPU (c:\windows\system32\cmd.exe /C start /affinity 1 ...).
Some measurement with starting the application and doing a query with minimal manual interaction in my VirtualBox machine:
OpenJDK 11.0.2: 36 seconds
OpenJDK 13.0.2: ~1.5 minutes
OpenJDK 13.0.2 with -XX:-UseBiasedLocking: 46 seconds
OpenJDK 13.0.2 with -XX:-ThreadLocalHandshakes: 40 seconds
OpenJDK 14: 5-6 minutes
OpenJDK 14 with -XX:-UseBiasedLocking: 3-3,5 minutes
OpenJDK 15 EA Build 20: ~4,5 minutes
Only the used JDK (and the mentioned options) has been changed. (-XX:-ThreadLocalHandshakes is not available in Java 14.)
We have tried logging what JDK 14 does with -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50.
Counting the log lines for every second seems quite smooth with OpenJDK 11.0.2:
$ cat jdk11-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n
30710 0
44012 1
55461 2
55974 3
27182 4
41292 5
43796 6
51889 7
54170 8
58850 9
51422 10
44378 11
41405 12
53589 13
41696 14
29526 15
2350 16
50228 17
62623 18
42684 19
45045 20
On the other hand, OpenJDK 14 seems to have interesting quiet periods:
$ cat jdk14-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n
7726 0
1715 5
10744 6
4341 11
42792 12
45979 13
38783 14
17253 21
34747 22
1025 28
2079 33
2398 39
3016 44
So, what's happening between seconds 1-4, 7-10 and 14-20?
...
[0.350s][7248][debug][class,resolve ] jdk.internal.ref.CleanerFactory$1 java.lang.Thread CleanerFactory.java:45
[0.350s][7248][debug][class,resolve ] jdk.internal.ref.CleanerImpl java.lang.Thread CleanerImpl.java:117
[0.350s][7248][info ][biasedlocking ] Aligned thread 0x000000001727e010 to 0x000000001727e800
[0.350s][7248][info ][os,thread ] Thread started (tid: 2944, attributes: stacksize: default, flags: CREATE_SUSPENDED STACK_SIZE_PARAM_IS)
[0.350s][6884][info ][os,thread ] Thread is alive (tid: 6884).
[0.350s][6884][debug][os,thread ] Thread 6884 stack dimensions: 0x00000000175b0000-0x00000000176b0000 (1024k).
[0.350s][6884][debug][os,thread ] Thread 6884 stack guard pages activated: 0x00000000175b0000-0x00000000175b4000.
[0.350s][7248][debug][thread,smr ] tid=7248: Threads::add: new ThreadsList=0x0000000017254500
[0.350s][7248][debug][thread,smr ] tid=7248: ThreadsSMRSupport::free_list: threads=0x0000000017253d50 is freed.
[0.350s][2944][info ][os,thread ] Thread is alive (tid: 2944).
[0.350s][2944][debug][os,thread ] Thread 2944 stack dimensions: 0x00000000177b0000-0x00000000178b0000 (1024k).
[0.350s][2944][debug][os,thread ] Thread 2944 stack guard pages activated: 0x00000000177b0000-0x00000000177b4000.
[0.351s][2944][debug][class,resolve ] java.lang.Thread java.lang.Runnable Thread.java:832
[0.351s][2944][debug][class,resolve ] jdk.internal.ref.CleanerImpl jdk.internal.misc.InnocuousThread CleanerImpl.java:135
[0.351s][2944][debug][class,resolve ] jdk.internal.ref.CleanerImpl jdk.internal.ref.PhantomCleanable CleanerImpl.java:138
[0.351s][2944][info ][biasedlocking,handshake] JavaThread 0x000000001727e800 handshaking JavaThread 0x000000000286d800 to revoke object 0x00000000c0087f78
[0.351s][2944][debug][vmthread ] Adding VM operation: HandshakeOneThread
[0.351s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeOneThread
[0.351s][6708][debug][vmoperation ] begin VM_Operation (0x00000000178af250): HandshakeOneThread, mode: no safepoint, requested by thread 0x000000001727e800
# no log until 5.723s
[5.723s][7248][info ][biasedlocking ] Revoked bias of currently-unlocked object
[5.723s][7248][debug][handshake,task ] Operation: RevokeOneBias for thread 0x000000000286d800, is_vm_thread: false, completed in 94800 ns
[5.723s][7248][debug][class,resolve ] java.util.zip.ZipFile$CleanableResource java.lang.ref.Cleaner ZipFile.java:715
[5.723s][7248][debug][class,resolve ] java.lang.ref.Cleaner jdk.internal.ref.CleanerImpl$PhantomCleanableRef Cleaner.java:220
[5.723s][7248][debug][class,resolve ] java.util.zip.ZipFile$CleanableResource java.util.WeakHashMap ZipFile.java:716
...
The second pause a little bit later:
...
[6.246s][7248][info ][class,load ] java.awt.Graphics source: jrt:/java.desktop
[6.246s][7248][debug][class,load ] klass: 0x0000000100081a00 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 5625 checksum: 0025818f
[6.246s][7248][debug][class,resolve ] java.awt.Graphics java.lang.Object (super)
[6.246s][7248][info ][class,loader,constraints] updating constraint for name java/awt/Graphics, loader 'bootstrap', by setting class object
[6.246s][7248][debug][jit,compilation ] 19 4 java.lang.Object::<init> (1 bytes) made not entrant
[6.246s][7248][debug][vmthread ] Adding VM operation: HandshakeAllThreads
[6.246s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeAllThreads
[6.246s][6708][debug][vmoperation ] begin VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1400 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 700 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 800 ns
# no log until 11.783s
[11.783s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 6300 ns
[11.783s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns
[11.783s][6708][debug][vmoperation ] end VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[11.783s][7248][debug][protectiondomain ] Checking package access
[11.783s][7248][debug][protectiondomain ] class loader: a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000000c0058628} protection domain: a 'java/security/ProtectionDomain'{0x00000000c058b948} loading: 'java/awt/Graphics'
[11.783s][7248][debug][protectiondomain ] granted
[11.783s][7248][debug][class,resolve ] sun.launcher.LauncherHelper java.awt.Graphics LauncherHelper.java:816 (reflection)
[11.783s][7248][debug][class,resolve ] jdk.internal.reflect.Reflection [Ljava.lang.reflect.Method; Reflection.java:300
[11.783s][7248][debug][class,preorder ] java.lang.PublicMethods$MethodList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
...
Then the third one:
...
[14.578s][7248][debug][class,preorder ] java.lang.InheritableThreadLocal source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
[14.578s][7248][info ][class,load ] java.lang.InheritableThreadLocal source: jrt:/java.base
[14.578s][7248][debug][class,load ] klass: 0x0000000100124740 super: 0x0000000100021a18 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 1338 checksum: 8013ed55
[14.578s][7248][debug][class,resolve ] java.lang.InheritableThreadLocal java.lang.ThreadLocal (super)
[14.578s][7248][debug][jit,compilation ] 699 3 java.lang.ThreadLocal::get (38 bytes) made not entrant
[14.578s][7248][debug][vmthread ] Adding VM operation: HandshakeAllThreads
[14.578s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeAllThreads
[14.578s][6708][debug][vmoperation ] begin VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1600 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 900 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 0 ns
[14.579s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 900 ns
# no log until 21.455s
[21.455s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 12100 ns
[21.455s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns
[21.455s][6708][debug][vmoperation ] end VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[21.455s][7248][debug][class,resolve ] sun.security.jca.Providers java.lang.InheritableThreadLocal Providers.java:39
[21.455s][7248][info ][class,init ] 1251 Initializing 'java/lang/InheritableThreadLocal'(no method) (0x0000000100124740)
[21.455s][7248][debug][class,resolve ] java.lang.InheritableThreadLocal java.lang.ThreadLocal InheritableThreadLocal.java:57
[21.456s][7248][debug][class,preorder ] sun.security.jca.ProviderList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
[21.456s][7248][info ][class,load ] sun.security.jca.ProviderList source: jrt:/java.base
[21.456s][7248][debug][class,load ] klass: 0x00000001001249a8 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 11522 checksum: bdc239d2
[21.456s][7248][debug][class,resolve ] sun.security.jca.ProviderList java.lang.Object (super)
...
The following two lines seems interesting:
[11.783s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns
[21.455s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns
Is that normal that these handshakes took 5.5 and 6.8 seconds?
I have experienced the same slowdown (and similar logs) with the update4j demo app (which is completely unrelated to our application) running with this command:
Z:\swing>\jdk-14\bin\java -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50 \
-jar update4j-1.4.5.jar --remote http://docs.update4j.org/demo/setup.xml
What should I look for to make our app faster again on single-CPU Windows 10 setups? Can I fix this by changing something in our application or by adding JVM arguments?
Is that a JDK bug, should I report it?
update 2020-04-25:
As far as I see the log files also contains GC logs. These are the first GC logs:
$ cat app.txt.00 | grep "\[gc"
[0.016s][7248][debug][gc,heap ] Minimum heap 8388608 Initial heap 60817408 Maximum heap 1073741824
[0.017s][7248][info ][gc,heap,coops ] Heap address: 0x00000000c0000000, size: 1024 MB, Compressed Oops mode: 32-bit
[0.018s][7248][info ][gc ] Using Serial
[22.863s][6708][info ][gc,start ] GC(0) Pause Young (Allocation Failure)
[22.863s][6708][debug][gc,heap ] GC(0) Heap before GC invocations=0 (full 0): def new generation total 17856K, used 15936K [0x00000000c0000000, 0x00000000c1350000, 0x00000000d5550000)
...
Unfortunately it does not seem related since it starts after the third pause.
update 2020-04-26:
With OpenJDK 14 the application uses 100% CPU in my (single-CPU) VirtualBox machine (running on a i7-6600U CPU). The virtual machine has 3,5 GB RAM. According to Task Manager 40%+ is free and disk activity is 0% (I guess this means no swapping). Adding another CPU to the virtual machine (and enabling hyper-threading for physical machines) make the application fast enough again. I just wondering, was it an intentional trade-off during JDK development to loss performance on (rare) single-CPU machines to make the JVM faster on multicore/hyper-threading CPUs?
TL;DR: It's an OpenJDK regression filed as JDK-8244340 and has been fixed in JDK 15 Build 24 (2020/5/20).
I did not except that but I could reproduce the issue with a simple hello world:
public class Main {
public static void main(String[] args) {
System.out.println("Hello world");
}
}
I have used these two batch files:
main-1cpu.bat, which limits the java process to only one CPU:
c:\windows\system32\cmd.exe /C start /affinity 1 \
\jdk-14\bin\java \
-Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50 \
Main
main-full.bat, the java process can use both CPUs:
c:\windows\system32\cmd.exe /C start /affinity FF \
\jdk-14\bin\java \
-Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50 \
Main
(The differences are the affinity value and name of the log file. I've wrapped it for easier reading but wrapping with \ probably doesn't work on Windows.)
A few measurements on Windows 10 x64 in VirtualBox (with two CPUs):
PS Z:\main> Measure-Command { .\main-1cpu.bat }
...
TotalSeconds : 7.0203455
...
PS Z:\main> Measure-Command { .\main-full.bat }
...
TotalSeconds : 1.5751352
...
PS Z:\main> Measure-Command { .\main-full.bat }
...
TotalSeconds : 1.5585384
...
PS Z:\main> Measure-Command { .\main-1cpu.bat }
...
TotalSeconds : 23.6482685
...
The produced tracelogs contain similar pauses that you can see in the question.
Running Main without tracelogs is faster but the difference still can be seen between the single-CPU and two-CPU version: ~4-7 seconds vs. ~400 ms.
I've sent this findings to the hotspot-dev#openjdk mail list and devs there confirmed that this is something that the JDK could handle better. You can find supposed fixes in the thread too.
Another thread about the regression on hotspot-runtime-dev#. JIRA issue for the fix: JDK-8244340
From my experience performance problems with JDKs are related mostly to one of the following:
JIT Compilation
VM configuration (heap sizes)
GC algorithm
Changes in the JVM/JDK which break known good running applications
(Oh, and I forgot to mention class loading...)
If you just use the default JVM configuration since OpenJDK11, maybe you should set some of the more prominent options to fixed values, like GC, Heap size, etc.
Maybe some graphical analyse tool could help track your issue down. Like Retrace, AppDynamics or FlightRecorder and the like. These give more overview on the overall state of heap, GC cycles, RAM, threads, CPU load and so on at a given time than log files could provide.
Do I understand correctly that your application writes about 30710 lines to the log within the first second of running (under OpenJDK11)? Why is it "only" writing about 7k lines under OpenJDK14 in the first second? This seems like a huge difference for an application that is just started on different JVMs to me... Are you sure there are not for example high amounts of Exception stacktraces dumped into the log?
The other numbers are even higher sometimes, so maybe the slowdowns are related to exception logging? Or even swapping, if RAM gets low?
Actually I am thinking, if an application does not write anything into the log, this is a sign of smooth running without problems (unless it is frozen entirely in this time). What is happening from seconds 12-22 (in the OpenJDK14 case here) is what would concern me more... the logged lines go through the roof... why?
And afterwards the logging goes down to all time low values of about 1-2k lines... what is the reason for that?? (Well, maybe it is the GC kicking in at second 22 and does a tabula rasa which resolves some things...?)
Another thing may be your statement about "single CPU" machines. Does this imply "single core" also (Idk, maybe your software is tailored on legacy hardware or something)?
And the "single CPU" VMs are running on those machines?
But I assume, I am wrong about these assumptions, since almost all CPUs are multicore nowadays... but I would investigate on a multithreading issue (deadlock) problem maybe.
Since it's using 100% CPU "most of the time", and it takes 10 times longer (!) with Java 14, it means that you're wasting 90% of your CPU in Java 14.
Running out of heap space can do that, as you spend a whole lot of time in GC, but you seem to have ruled that out.
I notice that you're tweaking the biased locking option, and that it makes a significant difference. That tells me that maybe your program does a lot of concurrent work in multiple threads. It's possible that your program has a concurrency bug that shows up in Java 14, but not in Java 10. That could also explain why adding another CPU makes it more than twice as fast.
Concurrency bugs often only show up when you're unlucky, and the trigger could really have been anything, like a change to hashmap organization, etc.
First, if it's feasible, check for any loops that might be busy-waiting instead of sleeping.
Then, run a profiler in sampling mode (jvisualvm will do) and look for methods that are taking a much larger % of total time than they should. Since your performance is off by a factor of 10, any problems in there should really jump out.
This is an interesting issue and it would require indeterminate amount of effort to narrow it down since there are many permutations and combinations that need to be tried out and data collected and collated.
Seems as of there has been no resolution to this for some time. Perhaps this might need to be escalated.
EDIT 2: Since "ThreadLocalHandshakes" is deprecated and we can assume that locking is contended, suggest trying without "UseBiasedLocking" to hopefully speed up this scenario.
However there are some suggestions to collect more data and attempt to isolate the issue.
Allocate more than one core [I see that you have tried it and the issue goes away. Seems to be an issue with a thread/s execution precluding others. See no 7 below)
Allocate more heap (perhaps the demands of v14 is more than that of earlier jdks)
Allocate more memory to the Win 10 VB.
Check the OS system messages (Win 10 in your case)
Run it in an non-virtualized Win 10.
Try a different build of jdk 14
Do a thread dump every (or profile)few intervals of time. Analyze what thread is running exclusively. Perhaps there is a setting for equitable time sharing. Perhaps there is a higher priority thread running. What is that thread and what is it doing? In linux you could stat the lightweight processes (threads) associated with a process and its state in realtime. Something similar on Win 10?
CPU usage? 100% or less? Constrained by CPU or mem? 100% CPU in service threads? Which service thread?
Have you explicitly set a GC algo?
I have personally witnessed issues within versions that have to do with GC, heap resizing, issues with virtualized containers and so on.
There is no easy answer to that, I think, especially since this question has been around for some time. But we can try, all the best and let us know what is the result of some of these isolation steps.
EDIT 1: from the updated question, it seems to be related to a GC or another service thread taking over the single core non-equitably (Thread-Local Handshakes)?
Be careful with logging to slow disks, it will slow down your application:
https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
But it doesn't seem likely to be the cause of the issue as the CPU is still busy and you don't have to wait for all threads to come to a safe point thanks to thread-local handshake: https://openjdk.java.net/jeps/312
Also not directly related to the problem you have but more generally if you want to try to squeeze more performance out of your hardware for startup time, take a look at AppCDS (class data sharing):
https://blog.codefx.org/java/application-class-data-sharing/

Android app cpu usage jump to 50-100% and trace show only JDWP

My application started to slowdown the phone after it ran in the background for a while.
After I recorded the cpu usage, the trace file shows me that JDWP take 99% of the cpu usage on the JDWP thread which is the only thread I can see in the trace file.
Is it possible that the JDWP (that Google says is Java Debug Wire Protocol) takes so much CPU or do I have a problem with the trace file? The phone start to work slower and get warm after the cpu usage jump so I'm pretty sure the usage data is OK.
Here are 2 trace files that show only JDWP:
https://drive.google.com/file/d/0B9MtungcpihwZnl0RFIwanktMEk/view?usp=sharing
Here is an album of before, in the middle and after the recording of the trace:
http://imgur.com/a/rOZ4g
The output of the command adb shell top -m 5 is here:
http://pastebin.com/2FJNVvZA
The JDWP thread shouldn't run very much; it's the thread that handles debugger commands, including some of the stuff that your IDE does automatically.
I can't remember ever seeing it taking a lot of CPU time, but I guess it could happen if you (or, more likely, some tool) is running a lot of commands.
Otherwise, you may have found a bug in the virtual machine (ART) or the IDE/tools.
If you feel like sharing the trace file, I wouldn't mind taking a look.
UPDATE 1: I've taken a look at the files. Indeed, there are no other threads than the JDWP thread. However, it's not using a lot of CPU -- hover your cursor over the top "JDWP." block and check the Wallclock time (4.344s) vs the CPU time (0.006s). You could also switch to the "Thread time" view instead of "Wall Clock time" in the dropdown box at the top.
Here's the human-readable part of one of the trace files:
*version
3
data-file-overflow=false
clock=dual
elapsed-time-usec=4345915
num-method-calls=410
clock-call-overhead-nsec=3808
vm=dalvik
*threads
1 main
14 Thread-7700
13 Thread-7699
12 Thread-7698
11 FileObserver
10 Binder_2
9 Binder_1
8 FinalizerWatchdogDaemon
7 FinalizerDaemon
6 ReferenceQueueDaemon
5 Compiler
4 JDWP
3 Signal Catcher
2 GC
*methods
0x6da675c0 android/ddm/DdmHandleProfiling handleChunk (Lorg/apache/harmony/dalvik/ddmc/Chunk;)Lorg/apache/harmony/dalvik/ddmc/Chunk; DdmHandleProfiling.java -1
0x6da673c0 android/ddm/DdmHandleProfiling handleMPRQ (Lorg/apache/harmony/dalvik/ddmc/Chunk;)Lorg/apache/harmony/dalvik/ddmc/Chunk; DdmHandleProfiling.java -1
0x6da67430 android/ddm/DdmHandleProfiling handleMPSE (Lorg/apache/harmony/dalvik/ddmc/Chunk;)Lorg/apache/harmony/dalvik/ddmc/Chunk; DdmHandleProfiling.java -1
0x6da67468 android/ddm/DdmHandleProfiling handleMPSS (Lorg/apache/harmony/dalvik/ddmc/Chunk;)Lorg/apache/harmony/dalvik/ddmc/Chunk; DdmHandleProfiling.java -1
0x6d8b4ca0 java/nio/ByteBuffer order (Ljava/nio/ByteOrder;)Ljava/nio/ByteBuffer; ByteBuffer.java 635
0x6d8b4410 java/nio/ByteBuffer <init> (ILjava/nio/MemoryBlock;)V ByteBuffer.java 116
0x6d8b44f0 java/nio/ByteBuffer wrap ([BII)Ljava/nio/ByteBuffer; ByteBuffer.java 108
0x6d8dd658 java/util/HashMap get (Ljava/lang/Object;)Ljava/lang/Object; HashMap.java -1
0x6d918008 java/util/Arrays checkOffsetAndCount (III)V Arrays.java 1731
0x6d9913a8 org/apache/harmony/dalvik/ddmc/ChunkHandler wrapChunk (Lorg/apache/harmony/dalvik/ddmc/Chunk;)Ljava/nio/ByteBuffer; ChunkHandler.java 80
0x6d9807d0 android/os/Debug getMethodTracingMode ()I Debug.java -1
0x6d981100 android/os/Debug startMethodTracingDdms (IIZI)V Debug.java -1
0x6d9811a8 android/os/Debug stopMethodTracing ()V Debug.java -1
0x6da67040 android/ddm/DdmHandleHeap handleChunk (Lorg/apache/harmony/dalvik/ddmc/Chunk;)Lorg/apache/harmony/dalvik/ddmc/Chunk; DdmHandleHeap.java -1
0x6da66e78 android/ddm/DdmHandleHeap handleHPIF (Lorg/apache/harmony/dalvik/ddmc/Chunk;)Lorg/apache/harmony/dalvik/ddmc/Chunk; DdmHandleHeap.java -1
0x6d8be8d0 java/lang/Integer equals (Ljava/lang/Object;)Z Integer.java 208
0x6d8be940 java/lang/Integer hashCode ()I Integer.java 302
0x6d8be190 java/lang/Integer <init> (I)V Integer.java 88
0x6d8be740 java/lang/Integer valueOf (I)Ljava/lang/Integer; Integer.java 706
0x6d8b5240 java/nio/Buffer <init> (IILjava/nio/MemoryBlock;)V Buffer.java 97
0x6dc2d778 org/apache/harmony/dalvik/ddmc/DdmVmInternal heapInfoNotify (I)Z DdmVmInternal.java -2
0x6d8b5ec8 org/apache/harmony/dalvik/ddmc/Chunk <init> (I[BII)V Chunk.java 45
0x6d9d22a8 java/nio/ByteArrayBuffer get ()B ByteArrayBuffer.java 151
0x6d9d2000 java/nio/ByteArrayBuffer <init> (I[BIZ)V ByteArrayBuffer.java 41
0x6d9d2038 java/nio/ByteArrayBuffer <init> ([B)V ByteArrayBuffer.java -1
0x6d8b5fe0 org/apache/harmony/dalvik/ddmc/DdmServer dispatch (I[BII)Lorg/apache/harmony/dalvik/ddmc/Chunk; DdmServer.java 143
0x6d8b7128 dalvik/system/VMDebug getMethodTracingMode ()I VMDebug.java -2
0x6d8b7550 dalvik/system/VMDebug startMethodTracingDdms (IIZI)V VMDebug.java 182
0x6d8b76d8 dalvik/system/VMDebug stopMethodTracing ()V VMDebug.java -2
0x6d8ba1b8 java/lang/Number <init> ()V Number.java 33
*end
As you can see, the other threads are at least present in the header.
The header contains no references to non-system methods.
I won't say it's impossible that you've found some weird bug -- but I think it's more likely that your app was completely idle at this point, except for the minor work of answering the trace commands over JDWP, and the JDWP-only trace managed to confuse you a bit.
Your title mentions "50-100% cpu usage", though, which suggests that something is running. It's unclear to me from where you got those numbers -- maybe they include everything, not just your app? Try running adb shell top -m 5 to find out the top CPU consumers in your system.

JVM crashes with no frame specified, only "timer expired, abort"

I am running a Java job under Hadoop which is crashing the JVM. I suspect this is due to some JNI code (it uses JBLAS with a multithreaded native BLAS implementation). However, while I expect the crash log to supply the "problematic frame" for debugging, instead the log looks like:
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x00007f204dd6fb27, pid=19570, tid=139776470402816
#
# JRE version: 6.0_38-b05
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.13-b02 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# # [ timer expired, abort... ]
Does the JVM have some timer for how long it will wait when producing this crash dump output? If so, is there a way to increase the time so I can get more helpful information? I don't think the timer referred to is coming from Hadoop, since I see (unhelpful) references to this error in many places which do not mention Hadoop.
Googling appears to show that the string "timer expired, abort" only shows up in these JVM error messages, so it is unlikely to come from the OS.
Edit: It looks like I am probably out of luck. From ./hotspot/src/share/vm/runtime/thread.cpp
in the OpenJDK version of the JVM source:
if (is_error_reported()) {
// A fatal error has happened, the error handler(VMError::report_and_die)
// should abort JVM after creating an error log file. However in some
// rare cases, the error handler itself might deadlock. Here we try to
// kill JVM if the fatal error handler fails to abort in 2 minutes.
//
// This code is in WatcherThread because WatcherThread wakes up
// periodically so the fatal error handler doesn't need to do anything;
// also because the WatcherThread is less likely to crash than other
// threads.
for (;;) {
if (!ShowMessageBoxOnError
&& (OnError == NULL || OnError[0] == '\0')
&& Arguments::abort_hook() == NULL) {
os::sleep(this, 2 * 60 * 1000, false);
fdStream err(defaultStream::output_fd());
err.print_raw_cr("# [ timer expired, abort... ]");
// skip atexit/vm_exit/vm_abort hooks
os::die();
}
// Wake up 5 seconds later, the fatal handler may reset OnError or
// ShowMessageBoxOnError when it is ready to abort.
os::sleep(this, 5 * 1000, false);
}
}
It appears to be hard-coded to wait two minutes. Why crash reporting for my job is taking longer than that, I don't know, but I think this question at least has been answered.
It looks like I am probably out of luck. From ./hotspot/src/share/vm/runtime/thread.cpp in the OpenJDK version of the JVM source:
if (is_error_reported()) {
// A fatal error has happened, the error handler(VMError::report_and_die)
// should abort JVM after creating an error log file. However in some
// rare cases, the error handler itself might deadlock. Here we try to
// kill JVM if the fatal error handler fails to abort in 2 minutes.
//
// This code is in WatcherThread because WatcherThread wakes up
// periodically so the fatal error handler doesn't need to do anything;
// also because the WatcherThread is less likely to crash than other
// threads.
for (;;) {
if (!ShowMessageBoxOnError
&& (OnError == NULL || OnError[0] == '\0')
&& Arguments::abort_hook() == NULL) {
os::sleep(this, 2 * 60 * 1000, false);
fdStream err(defaultStream::output_fd());
err.print_raw_cr("# [ timer expired, abort... ]");
// skip atexit/vm_exit/vm_abort hooks
os::die();
}
// Wake up 5 seconds later, the fatal handler may reset OnError or
// ShowMessageBoxOnError when it is ready to abort.
os::sleep(this, 5 * 1000, false);
}
}
It appears to be hard-coded to wait two minutes. Why crash reporting for my job is taking longer than that, I don't know, but I think this question at least has been answered.
The way around this is to specify -XX:ShowMessageBoxOnError on the command line and attach to the process with a debugger from another term.

Categories

Resources