Analysing java app performance for the third party application

Analysing java app performance for the third party application - java

I have a third party application named oVirt, We need to connect to this application using their Rest API exposed.
We have 4 GB RAM in the VM and allocated 1GB for the third party application.
Also this is the CPU configuration,
[root#test ~]# lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 23
Model name: Pentium(R) Dual-Core CPU E5200 # 2.50GHz
Stepping: 10
CPU MHz: 1200.000
CPU max MHz: 2500.0000
CPU min MHz: 1200.0000
BogoMIPS: 4999.77
L1d cache: 32K
L1i cache: 32K
L2 cache: 2048K
NUMA node0 CPU(s): 0,1
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good nopl aperfmperf pni dtes64 monitor ds_cpl est tm2 ssse3 cx16 xtpr pdcm xsave lahf_lm dtherm
NOTE : The oVirt application itself exposed the JMX support.
Then we start to hit the rest API of the application with 500 number of request in which 30 request are hit parallely. I could see that it scales successfully. I could see that memory is not even using 600 MB during the API hit.
Then we increased the concurrent hits to 32 with 500 request and it failed without any error saying timed out.
I increases the RAM for third party app to 2GB still it fails at 35 concurrent request and sometimes it fails at 32 request again.
I have one more environment with 2 GB Ram running the same environment with different CPU configuration,
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 58
Model name: Intel(R) Core(TM) i5-3450 CPU # 3.10GHz
Stepping: 9
CPU MHz: 1600.132
BogoMIPS: 6200.36
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 6144K
NUMA node0 CPU(s): 0-3
It can pass 63 concurrent request, so what is the point in it? I didnt understand the issue in Scaling.
I observed one log in their application :
2018-03-02 10:58:39,821+05 INFO [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedThreadFactory-engineThreadMonitoring-Thread-1) [] Thread pool 'engineScheduled' is using 0 threads out of 100 and 1 tasks are waiting in the queue.
2018-03-02 10:58:39,822+05 INFO [org.ovirt.engine.core.bll.utils.ThreadPoolMonitoringService] (EE-ManagedThreadFactory-engineThreadMonitoring-Thread-1) [] Thread pool 'engineThreadMonitoring' is using 1 threads out of 1 and 0 tasks are waiting in the queue.
How to analyse this ?
Could somebody explain the below threading output?
EDITED :
Added JMX response :
[standalone#127.0.0.1:8706 /] ls /core-service=platform-mbean/type=threading
all-thread-ids=[559L,558L,557L,556L,555L,554L,553L,552L,455L,399L,326L,325L,302L,301L,300L,299L,298L,297L,296L,295L,294L,293L,292L,289L,288L,287L,286L,285L,284L,283L,282L,281L,280L,279L,278L,277L,276L,275L,274L,273L,272L,271L,269L,264L,263L,262L,261L,260L,259L,258L,257L,256L,255L,254L,252L,251L,242L,237L,236L,235L,234L,233L,232L,231L,230L,227L,226L,225L,224L,223L,222L,221L,220L,219L,218L,217L,216L,215L,214L,213L,212L,211L,209L,208L,206L,205L,204L,203L,202L,201L,200L,199L,198L,197L,196L,195L,194L,193L,192L,191L,190L,189L,188L,187L,186L,185L,184L,183L,182L,181L,180L,179L,178L,177L,176L,175L,174L,173L,172L,171L,168L,167L,166L,165L,164L,163L,162L,161L,160L,159L,158L,157L,155L,154L,153L,152L,151L,150L,149L,148L,147L,146L,145L,144L,143L,142L,141L,140L,139L,135L,134L,132L,133L,131L,130L,129L,128L,127L,126L,125L,124L,123L,122L,121L,120L,119L,118L,117L,116L,114L,113L,112L,111L,110L,109L,108L,107L,106L,105L,104L,103L,102L,101L,99L,98L,97L,96L,84L,83L,80L,77L,76L,75L,74L,73L,72L,70L,71L,69L,68L,67L,66L,65L,62L,60L,64L,44L,43L,42L,41L,39L,38L,18L,17L,15L,14L,13L,12L,8L,4L,3L,2L]
thread-contention-monitoring-supported=true
thread-cpu-time-supported=true
current-thread-cpu-time-supported=true
object-monitor-usage-supported=true
synchronizer-usage-supported=true
thread-contention-monitoring-enabled=false
thread-cpu-time-enabled=true
thread-count=222
peak-thread-count=223
total-started-thread-count=551
daemon-thread-count=147
current-thread-cpu-time=62992810
current-thread-user-time=50000000
object-name=java.lang:type=Threading
[standalone#127.0.0.1:8706 /] ls /core-service=platform-mbean/type=memory
heap-memory-usage={"init" => 1073741824L,"used" => 801489408L,"committed" => 2016411648L,"max" => 2016411648L}
non-heap-memory-usage={"init" => 2555904L,"used" => 194310080L,"committed" => 212074496L,"max" => -1L}
object-name=java.lang:type=Memory
object-pending-finalization-count=0
verbose=true

Related

Slow application, frequent JVM hangs with single-CPU setups and Java 12+

We have a client application (with 10+ years of development). Its JDK was upgraded from OpenJDK 11 to OpenJDK 14 recently. On single-CPU (hyper-threading disabled) Windows 10 setups (and inside VirtualBox machines with only one available CPU) the application starts quite slowly compared to Java 11. Furthermore, it uses 100% CPU most of the time. We could also reproduce the issue with setting the processor affinity to only one CPU (c:\windows\system32\cmd.exe /C start /affinity 1 ...).
Some measurement with starting the application and doing a query with minimal manual interaction in my VirtualBox machine:
OpenJDK 11.0.2: 36 seconds
OpenJDK 13.0.2: ~1.5 minutes
OpenJDK 13.0.2 with -XX:-UseBiasedLocking: 46 seconds
OpenJDK 13.0.2 with -XX:-ThreadLocalHandshakes: 40 seconds
OpenJDK 14: 5-6 minutes
OpenJDK 14 with -XX:-UseBiasedLocking: 3-3,5 minutes
OpenJDK 15 EA Build 20: ~4,5 minutes
Only the used JDK (and the mentioned options) has been changed. (-XX:-ThreadLocalHandshakes is not available in Java 14.)
We have tried logging what JDK 14 does with -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50.
Counting the log lines for every second seems quite smooth with OpenJDK 11.0.2:
$ cat jdk11-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n
30710 0
44012 1
55461 2
55974 3
27182 4
41292 5
43796 6
51889 7
54170 8
58850 9
51422 10
44378 11
41405 12
53589 13
41696 14
29526 15
2350 16
50228 17
62623 18
42684 19
45045 20
On the other hand, OpenJDK 14 seems to have interesting quiet periods:
$ cat jdk14-log/app* | grep "^\[" | cut -d. -f 1 | cut -d[ -f 2 | sort | uniq -c | sort -k 2 -n
7726 0
1715 5
10744 6
4341 11
42792 12
45979 13
38783 14
17253 21
34747 22
1025 28
2079 33
2398 39
3016 44
So, what's happening between seconds 1-4, 7-10 and 14-20?
...
[0.350s][7248][debug][class,resolve ] jdk.internal.ref.CleanerFactory$1 java.lang.Thread CleanerFactory.java:45
[0.350s][7248][debug][class,resolve ] jdk.internal.ref.CleanerImpl java.lang.Thread CleanerImpl.java:117
[0.350s][7248][info ][biasedlocking ] Aligned thread 0x000000001727e010 to 0x000000001727e800
[0.350s][7248][info ][os,thread ] Thread started (tid: 2944, attributes: stacksize: default, flags: CREATE_SUSPENDED STACK_SIZE_PARAM_IS)
[0.350s][6884][info ][os,thread ] Thread is alive (tid: 6884).
[0.350s][6884][debug][os,thread ] Thread 6884 stack dimensions: 0x00000000175b0000-0x00000000176b0000 (1024k).
[0.350s][6884][debug][os,thread ] Thread 6884 stack guard pages activated: 0x00000000175b0000-0x00000000175b4000.
[0.350s][7248][debug][thread,smr ] tid=7248: Threads::add: new ThreadsList=0x0000000017254500
[0.350s][7248][debug][thread,smr ] tid=7248: ThreadsSMRSupport::free_list: threads=0x0000000017253d50 is freed.
[0.350s][2944][info ][os,thread ] Thread is alive (tid: 2944).
[0.350s][2944][debug][os,thread ] Thread 2944 stack dimensions: 0x00000000177b0000-0x00000000178b0000 (1024k).
[0.350s][2944][debug][os,thread ] Thread 2944 stack guard pages activated: 0x00000000177b0000-0x00000000177b4000.
[0.351s][2944][debug][class,resolve ] java.lang.Thread java.lang.Runnable Thread.java:832
[0.351s][2944][debug][class,resolve ] jdk.internal.ref.CleanerImpl jdk.internal.misc.InnocuousThread CleanerImpl.java:135
[0.351s][2944][debug][class,resolve ] jdk.internal.ref.CleanerImpl jdk.internal.ref.PhantomCleanable CleanerImpl.java:138
[0.351s][2944][info ][biasedlocking,handshake] JavaThread 0x000000001727e800 handshaking JavaThread 0x000000000286d800 to revoke object 0x00000000c0087f78
[0.351s][2944][debug][vmthread ] Adding VM operation: HandshakeOneThread
[0.351s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeOneThread
[0.351s][6708][debug][vmoperation ] begin VM_Operation (0x00000000178af250): HandshakeOneThread, mode: no safepoint, requested by thread 0x000000001727e800
# no log until 5.723s
[5.723s][7248][info ][biasedlocking ] Revoked bias of currently-unlocked object
[5.723s][7248][debug][handshake,task ] Operation: RevokeOneBias for thread 0x000000000286d800, is_vm_thread: false, completed in 94800 ns
[5.723s][7248][debug][class,resolve ] java.util.zip.ZipFile$CleanableResource java.lang.ref.Cleaner ZipFile.java:715
[5.723s][7248][debug][class,resolve ] java.lang.ref.Cleaner jdk.internal.ref.CleanerImpl$PhantomCleanableRef Cleaner.java:220
[5.723s][7248][debug][class,resolve ] java.util.zip.ZipFile$CleanableResource java.util.WeakHashMap ZipFile.java:716
...
The second pause a little bit later:
...
[6.246s][7248][info ][class,load ] java.awt.Graphics source: jrt:/java.desktop
[6.246s][7248][debug][class,load ] klass: 0x0000000100081a00 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 5625 checksum: 0025818f
[6.246s][7248][debug][class,resolve ] java.awt.Graphics java.lang.Object (super)
[6.246s][7248][info ][class,loader,constraints] updating constraint for name java/awt/Graphics, loader 'bootstrap', by setting class object
[6.246s][7248][debug][jit,compilation ] 19 4 java.lang.Object::<init> (1 bytes) made not entrant
[6.246s][7248][debug][vmthread ] Adding VM operation: HandshakeAllThreads
[6.246s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeAllThreads
[6.246s][6708][debug][vmoperation ] begin VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1400 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 700 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns
[6.246s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 100 ns
[6.247s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 800 ns
# no log until 11.783s
[11.783s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 6300 ns
[11.783s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns
[11.783s][6708][debug][vmoperation ] end VM_Operation (0x000000000203ddf8): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[11.783s][7248][debug][protectiondomain ] Checking package access
[11.783s][7248][debug][protectiondomain ] class loader: a 'jdk/internal/loader/ClassLoaders$AppClassLoader'{0x00000000c0058628} protection domain: a 'java/security/ProtectionDomain'{0x00000000c058b948} loading: 'java/awt/Graphics'
[11.783s][7248][debug][protectiondomain ] granted
[11.783s][7248][debug][class,resolve ] sun.launcher.LauncherHelper java.awt.Graphics LauncherHelper.java:816 (reflection)
[11.783s][7248][debug][class,resolve ] jdk.internal.reflect.Reflection [Ljava.lang.reflect.Method; Reflection.java:300
[11.783s][7248][debug][class,preorder ] java.lang.PublicMethods$MethodList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
...
Then the third one:
...
[14.578s][7248][debug][class,preorder ] java.lang.InheritableThreadLocal source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
[14.578s][7248][info ][class,load ] java.lang.InheritableThreadLocal source: jrt:/java.base
[14.578s][7248][debug][class,load ] klass: 0x0000000100124740 super: 0x0000000100021a18 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 1338 checksum: 8013ed55
[14.578s][7248][debug][class,resolve ] java.lang.InheritableThreadLocal java.lang.ThreadLocal (super)
[14.578s][7248][debug][jit,compilation ] 699 3 java.lang.ThreadLocal::get (38 bytes) made not entrant
[14.578s][7248][debug][vmthread ] Adding VM operation: HandshakeAllThreads
[14.578s][6708][debug][vmthread ] Evaluating non-safepoint VM operation: HandshakeAllThreads
[14.578s][6708][debug][vmoperation ] begin VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026b0800, is_vm_thread: true, completed in 1600 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026bb800, is_vm_thread: true, completed in 900 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026ef800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f0800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f1800, is_vm_thread: true, completed in 100 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x00000000026f4800, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000002768800, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000276e000, is_vm_thread: true, completed in 0 ns
[14.578s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x0000000017268800, is_vm_thread: true, completed in 0 ns
[14.579s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000001727e800, is_vm_thread: true, completed in 900 ns
# no log until 21.455s
[21.455s][6708][debug][handshake,task ] Operation: Deoptimize for thread 0x000000000286d800, is_vm_thread: true, completed in 12100 ns
[21.455s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns
[21.455s][6708][debug][vmoperation ] end VM_Operation (0x000000000203d228): HandshakeAllThreads, mode: no safepoint, requested by thread 0x000000000286d800
[21.455s][7248][debug][class,resolve ] sun.security.jca.Providers java.lang.InheritableThreadLocal Providers.java:39
[21.455s][7248][info ][class,init ] 1251 Initializing 'java/lang/InheritableThreadLocal'(no method) (0x0000000100124740)
[21.455s][7248][debug][class,resolve ] java.lang.InheritableThreadLocal java.lang.ThreadLocal InheritableThreadLocal.java:57
[21.456s][7248][debug][class,preorder ] sun.security.jca.ProviderList source: C:\Users\example\AppData\Local\example\stable\jdk\lib\modules
[21.456s][7248][info ][class,load ] sun.security.jca.ProviderList source: jrt:/java.base
[21.456s][7248][debug][class,load ] klass: 0x00000001001249a8 super: 0x0000000100001080 loader: [loader data: 0x0000000002882bd0 of 'bootstrap'] bytes: 11522 checksum: bdc239d2
[21.456s][7248][debug][class,resolve ] sun.security.jca.ProviderList java.lang.Object (super)
...
The following two lines seems interesting:
[11.783s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 5536442500 ns
[21.455s][6708][info ][handshake ] Handshake "Deoptimize", Targeted threads: 11, Executed by targeted threads: 0, Total completion time: 6876829000 ns
Is that normal that these handshakes took 5.5 and 6.8 seconds?
I have experienced the same slowdown (and similar logs) with the update4j demo app (which is completely unrelated to our application) running with this command:
Z:\swing>\jdk-14\bin\java -Xlog:all=debug:file=app.txt:uptime,tid,level,tags:filecount=50 \
-jar update4j-1.4.5.jar --remote http://docs.update4j.org/demo/setup.xml
What should I look for to make our app faster again on single-CPU Windows 10 setups? Can I fix this by changing something in our application or by adding JVM arguments?
Is that a JDK bug, should I report it?
update 2020-04-25:
As far as I see the log files also contains GC logs. These are the first GC logs:
$ cat app.txt.00 | grep "\[gc"
[0.016s][7248][debug][gc,heap ] Minimum heap 8388608 Initial heap 60817408 Maximum heap 1073741824
[0.017s][7248][info ][gc,heap,coops ] Heap address: 0x00000000c0000000, size: 1024 MB, Compressed Oops mode: 32-bit
[0.018s][7248][info ][gc ] Using Serial
[22.863s][6708][info ][gc,start ] GC(0) Pause Young (Allocation Failure)
[22.863s][6708][debug][gc,heap ] GC(0) Heap before GC invocations=0 (full 0): def new generation total 17856K, used 15936K [0x00000000c0000000, 0x00000000c1350000, 0x00000000d5550000)
...
Unfortunately it does not seem related since it starts after the third pause.
update 2020-04-26:
With OpenJDK 14 the application uses 100% CPU in my (single-CPU) VirtualBox machine (running on a i7-6600U CPU). The virtual machine has 3,5 GB RAM. According to Task Manager 40%+ is free and disk activity is 0% (I guess this means no swapping). Adding another CPU to the virtual machine (and enabling hyper-threading for physical machines) make the application fast enough again. I just wondering, was it an intentional trade-off during JDK development to loss performance on (rare) single-CPU machines to make the JVM faster on multicore/hyper-threading CPUs?

TL;DR: It's an OpenJDK regression filed as JDK-8244340 and has been fixed in JDK 15 Build 24 (2020/5/20).
I did not except that but I could reproduce the issue with a simple hello world:
public class Main {
public static void main(String[] args) {
System.out.println("Hello world");
}
}
I have used these two batch files:
main-1cpu.bat, which limits the java process to only one CPU:
c:\windows\system32\cmd.exe /C start /affinity 1 \
\jdk-14\bin\java \
-Xlog:all=trace:file=app-1cpu.txt:uptime,tid,level,tags:filecount=50 \
Main
main-full.bat, the java process can use both CPUs:
c:\windows\system32\cmd.exe /C start /affinity FF \
\jdk-14\bin\java \
-Xlog:all=trace:file=app-full.txt:uptime,tid,level,tags:filecount=50 \
Main
(The differences are the affinity value and name of the log file. I've wrapped it for easier reading but wrapping with \ probably doesn't work on Windows.)
A few measurements on Windows 10 x64 in VirtualBox (with two CPUs):
PS Z:\main> Measure-Command { .\main-1cpu.bat }
...
TotalSeconds : 7.0203455
...
PS Z:\main> Measure-Command { .\main-full.bat }
...
TotalSeconds : 1.5751352
...
PS Z:\main> Measure-Command { .\main-full.bat }
...
TotalSeconds : 1.5585384
...
PS Z:\main> Measure-Command { .\main-1cpu.bat }
...
TotalSeconds : 23.6482685
...
The produced tracelogs contain similar pauses that you can see in the question.
Running Main without tracelogs is faster but the difference still can be seen between the single-CPU and two-CPU version: ~4-7 seconds vs. ~400 ms.
I've sent this findings to the hotspot-dev#openjdk mail list and devs there confirmed that this is something that the JDK could handle better. You can find supposed fixes in the thread too.
Another thread about the regression on hotspot-runtime-dev#. JIRA issue for the fix: JDK-8244340

From my experience performance problems with JDKs are related mostly to one of the following:
JIT Compilation
VM configuration (heap sizes)
GC algorithm
Changes in the JVM/JDK which break known good running applications
(Oh, and I forgot to mention class loading...)
If you just use the default JVM configuration since OpenJDK11, maybe you should set some of the more prominent options to fixed values, like GC, Heap size, etc.
Maybe some graphical analyse tool could help track your issue down. Like Retrace, AppDynamics or FlightRecorder and the like. These give more overview on the overall state of heap, GC cycles, RAM, threads, CPU load and so on at a given time than log files could provide.
Do I understand correctly that your application writes about 30710 lines to the log within the first second of running (under OpenJDK11)? Why is it "only" writing about 7k lines under OpenJDK14 in the first second? This seems like a huge difference for an application that is just started on different JVMs to me... Are you sure there are not for example high amounts of Exception stacktraces dumped into the log?
The other numbers are even higher sometimes, so maybe the slowdowns are related to exception logging? Or even swapping, if RAM gets low?
Actually I am thinking, if an application does not write anything into the log, this is a sign of smooth running without problems (unless it is frozen entirely in this time). What is happening from seconds 12-22 (in the OpenJDK14 case here) is what would concern me more... the logged lines go through the roof... why?
And afterwards the logging goes down to all time low values of about 1-2k lines... what is the reason for that?? (Well, maybe it is the GC kicking in at second 22 and does a tabula rasa which resolves some things...?)
Another thing may be your statement about "single CPU" machines. Does this imply "single core" also (Idk, maybe your software is tailored on legacy hardware or something)?
And the "single CPU" VMs are running on those machines?
But I assume, I am wrong about these assumptions, since almost all CPUs are multicore nowadays... but I would investigate on a multithreading issue (deadlock) problem maybe.

Since it's using 100% CPU "most of the time", and it takes 10 times longer (!) with Java 14, it means that you're wasting 90% of your CPU in Java 14.
Running out of heap space can do that, as you spend a whole lot of time in GC, but you seem to have ruled that out.
I notice that you're tweaking the biased locking option, and that it makes a significant difference. That tells me that maybe your program does a lot of concurrent work in multiple threads. It's possible that your program has a concurrency bug that shows up in Java 14, but not in Java 10. That could also explain why adding another CPU makes it more than twice as fast.
Concurrency bugs often only show up when you're unlucky, and the trigger could really have been anything, like a change to hashmap organization, etc.
First, if it's feasible, check for any loops that might be busy-waiting instead of sleeping.
Then, run a profiler in sampling mode (jvisualvm will do) and look for methods that are taking a much larger % of total time than they should. Since your performance is off by a factor of 10, any problems in there should really jump out.

This is an interesting issue and it would require indeterminate amount of effort to narrow it down since there are many permutations and combinations that need to be tried out and data collected and collated.
Seems as of there has been no resolution to this for some time. Perhaps this might need to be escalated.
EDIT 2: Since "ThreadLocalHandshakes" is deprecated and we can assume that locking is contended, suggest trying without "UseBiasedLocking" to hopefully speed up this scenario.
However there are some suggestions to collect more data and attempt to isolate the issue.
Allocate more than one core [I see that you have tried it and the issue goes away. Seems to be an issue with a thread/s execution precluding others. See no 7 below)
Allocate more heap (perhaps the demands of v14 is more than that of earlier jdks)
Allocate more memory to the Win 10 VB.
Check the OS system messages (Win 10 in your case)
Run it in an non-virtualized Win 10.
Try a different build of jdk 14
Do a thread dump every (or profile)few intervals of time. Analyze what thread is running exclusively. Perhaps there is a setting for equitable time sharing. Perhaps there is a higher priority thread running. What is that thread and what is it doing? In linux you could stat the lightweight processes (threads) associated with a process and its state in realtime. Something similar on Win 10?
CPU usage? 100% or less? Constrained by CPU or mem? 100% CPU in service threads? Which service thread?
Have you explicitly set a GC algo?
I have personally witnessed issues within versions that have to do with GC, heap resizing, issues with virtualized containers and so on.
There is no easy answer to that, I think, especially since this question has been around for some time. But we can try, all the best and let us know what is the result of some of these isolation steps.
EDIT 1: from the updated question, it seems to be related to a GC or another service thread taking over the single core non-equitably (Thread-Local Handshakes)?

Be careful with logging to slow disks, it will slow down your application:
https://engineering.linkedin.com/blog/2016/02/eliminating-large-jvm-gc-pauses-caused-by-background-io-traffic
But it doesn't seem likely to be the cause of the issue as the CPU is still busy and you don't have to wait for all threads to come to a safe point thanks to thread-local handshake: https://openjdk.java.net/jeps/312
Also not directly related to the problem you have but more generally if you want to try to squeeze more performance out of your hardware for startup time, take a look at AppCDS (class data sharing):
https://blog.codefx.org/java/application-class-data-sharing/

A lot of java.util.zip.ZipFile$ZipFileInputStream objects/ Same jars loaded multiple times

I am checking a heap dump of an Tomcat application which tends to crash when it has increased load right after startup. During the crash I observe an increasing number of unsuccessful attempts to perform full GC (6GB max heap size, using CMS) and increased thread count (ranging from 600 to 2000). MAT reports:
One instance of "java.lang.ref.Finalizer" loaded by "<system class loader>" occupies
5 291 528 160 (94,59%) bytes. The instance is referenced
by org.python.core.ThreadState # 0x679fba460 , loaded
by "org.apache.catalina.loader.ParallelWebappClassLoader # 0x674d08e88".
The memory is accumulated in one instance of "java.lang.ref.Finalizer" loaded
by "<system class loader>".
Keywords
java.lang.ref.Finalizer
org.apache.catalina.loader.ParallelWebappClassLoader # 0x674d08e88
Classes and amount of objects refered by java.lang.ref.Finalizer:
...
302 org.geoserver.platform.resource.FileSystemResourceStore$FileSystemResource$1
406 java.net.SocksSocketImpl
1519 java.util.jar.JarFile
2075 java.util.zip.Deflater
2086 org.geotools.map.MapContent
2094 java.util.Timer$1
2138 org.geoserver.wms.WMSMapContent
2328 org.geotools.jdbc.JDBCFeatureReader
2767 sun.net.www.protocol.jar.URLJarFile
4106 javax.media.jai.WritableRenderedImageAdapter
4724 org.geotools.map.FeatureLayer
5801 java.io.FileInputStream
8045 java.util.zip.Inflater
24981 java.util.zip.ZipFile$ZipFileInflaterInputStream
26772 java.util.zip.ZipFile$ZipFileInputStream
Files reffered by java.util.zip.ZipFile$ZipFileInputStream:
...
339 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/gt-metadata-9.4.jar
345 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/hsqldb-1.8.0.10.jar
388 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/javassist.jar
396 /usr/local/tomcat8/lib/orai18n-servlet.jar
427 /usr/lib/jvm/jdk1.8.0_112/jre/lib/ext/sunjce_provider.jar
448 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/xstream-1.4.2.jar
449 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/jersey-core-1.17.jar
474 /usr/local/tomcat8/lib/orai18n.jar
538 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/jersey-server-1.17.jar
553 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/gt-referencing-9.4.jar
591 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/jts-1.13.jar
614 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/jackson-databind-2.1.4.jar
679 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/gt-opengis-9.4.jar
688 /usr/local/tomcat8/webapps/geoserver/WEB-INF/lib/gt-xsd-gml3-15.2.jar
908 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/jai-core-1.1.3.jar
1023 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/gt-main-9.4.jar
1161 /usr/local/tomcat8/webapps/my-server/WEB-INF/lib/ojdbc6.jar
1194 /usr/local/tomcat8/lib/orai18n-translation.jar
1688 /usr/lib/jvm/jdk1.8.0_112/jre/lib/rt.jar
6631 /usr/local/tomcat8/lib/ojdbc6.jar
It seems strange to me that jar files are being loaded multiple times. I would suspect memory leak but since even jdk1.8.0_112/jre/lib/rt.jar is loaded 1000+ times I do not know what to think. So I am looking for an explanation why Java loads same jar files multiple times. Maybe there is a flag to control this process and I could improve Tomcat's performance that way.

https://bugs.openjdk.java.net/browse/JDK-8212621
the finalize method will be removed in jdk9;
The finalize method in java.util.ZipFile, java.util.Inflator, and java.util.Deflator was deprecated for removal in JDK 9 and its implementation was updated to be a no-op.
so if you upgrade to use jdk 9 or above, there will not such issue in jdk 9.
for how to fix in jdk 8, still no idea.
brs

When i go to load in a a Minecraft world to test if block generation is working properly i get this error message

RuntimeException: No OpenGL context found in the current thread.
here is the full crash report if needed-
---- Minecraft Crash Report ----
// Who set us up the TNT?
Time: 7/17/16 9:41 AM
Description: Exception in server tick loop
java.lang.IllegalArgumentException: bound must be positive
at java.util.Random.nextInt(Random.java:388)
at hunterghostfist.supersodas.YeastGeneration.generateOre(YeastGeneration.java:50)
at hunterghostfist.supersodas.YeastGeneration.generateOverworld(YeastGeneration.java:35)
at hunterghostfist.supersodas.YeastGeneration.generate(YeastGeneration.java:22)
at cpw.mods.fml.common.registry.GameRegistry.generateWorld(GameRegistry.java:112)
at net.minecraft.world.gen.ChunkProviderServer.populate(ChunkProviderServer.java:314)
at net.minecraft.world.chunk.Chunk.populateChunk(Chunk.java:1157)
at net.minecraft.world.gen.ChunkProviderServer.originalLoadChunk(ChunkProviderServer.java:208)
at net.minecraft.world.gen.ChunkProviderServer.loadChunk(ChunkProviderServer.java:149)
at net.minecraft.world.gen.ChunkProviderServer.loadChunk(ChunkProviderServer.java:119)
at net.minecraft.server.MinecraftServer.initialWorldChunkLoad(MinecraftServer.java:305)
at net.minecraft.server.integrated.IntegratedServer.loadAllWorlds(IntegratedServer.java:79)
at net.minecraft.server.integrated.IntegratedServer.startServer(IntegratedServer.java:96)
at net.minecraft.server.MinecraftServer.run(MinecraftServer.java:445)
at net.minecraft.server.MinecraftServer$2.run(MinecraftServer.java:752)
A detailed walkthrough of the error, its code path and all known details is as follows:
-- System Details --
Details:
Minecraft Version: 1.7.10
Operating System: Windows 10 (amd64) version 10.0
Java Version: 1.8.0_91, Oracle Corporation
Java VM Version: Java HotSpot(TM) 64-Bit Server VM (mixed mode), Oracle Corporation
Memory: 832797088 bytes (794 MB) / 1038876672 bytes (990 MB) up to 1038876672 bytes (990 MB)
JVM Flags: 3 total; -Xincgc -Xmx1024M -Xms1024M
AABB Pool Size: 0 (0 bytes; 0 MB) allocated, 0 (0 bytes; 0 MB) used
IntCache: cache: 0, tcache: 0, allocated: 13, tallocated: 95
FML: MCP v9.05 FML v7.10.99.99 Minecraft Forge 10.13.4.1614 4 mods loaded, 4 mods active
States: 'U' = Unloaded 'L' = Loaded 'C' = Constructed 'H' = Pre-initialized 'I' = Initialized 'J' = Post-initialized 'A' = Available 'D' = Disabled 'E' = Errored
UCHIJAA mcp{9.05} [Minecraft Coder Pack] (minecraft.jar)
UCHIJAA FML{7.10.99.99} [Forge Mod Loader] (forgeSrc-1.7.10-10.13.4.1614-1.7.10.jar)
UCHIJAA Forge{10.13.4.1614} [Minecraft Forge] (forgeSrc-1.7.10-10.13.4.1614-1.7.10.jar)
UCHIJAA ss{ 1.0} [Super Sodas] (bin)
GL info: ~~ERROR~~ RuntimeException: No OpenGL context found in the current thread.
Profiler Position: N/A (disabled)
Vec3 Pool Size: 0 (0 bytes; 0 MB) allocated, 0 (0 bytes; 0 MB) used
Player Count: 0 / 8; []
Type: Integrated Server (map_client.txt)
Is Modded: Definitely; Client brand changed to 'fml,forge'

This tells you all you need to know
java.lang.IllegalArgumentException: bound must be positive
at java.util.Random.nextInt(Random.java:388)
at hunterghostfist.supersodas.YeastGeneration.generateOre(YeastGeneration.java:50)
This means on this line YeastGeneration.generateOre(YeastGeneration.java:50) you are calling Random.nextInt(x) but the value you are passing is not positive and it must be. e.g. you are passing 0 or a negative number

Embedded Jetty timeout under load

I have an akka (Java) application with camel-jetty consumer. Under some minimum load (about 10 TPS), our client starts seeing HTTP 503 error. I tried to reproduce the problem in our lab, and it seems jetty can't handle overlapping HTTP requests. Below is the output from apache bench (ab):
ab sends 10 requests using one single thread (i.e. one request at a time)
ab -n 10 -c 1 -p bad.txt http://192.168.20.103:8899/pim
Benchmarking 192.168.20.103 (be patient).....done
Server Software: Jetty(8.1.16.v20140903)
Server Hostname: 192.168.20.103
Server Port: 8899
Document Path: /pim
Document Length: 33 bytes
Concurrency Level: 1
Time taken for tests: 0.61265 seconds
Complete requests: 10
Failed requests: 0
Requests per second: 163.23 [#/sec] (mean)
Time per request: 6.126 [ms] (mean)
Time per request: 6.126 [ms] (mean, across all concurrent requests)
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 1.0 1 2
Processing: 3 4 1.8 5 7
Waiting: 2 4 1.8 5 7
Total: 3 5 1.9 6 8
Percentage of the requests served within a certain time (ms)
50% 6
66% 6
75% 6
80% 8
90% 8
95% 8
98% 8
99% 8
100% 8 (longest request)
ab sends 10 requests using two threads (up to 2 requests at the same time):
ab -n 10 -c 2 -p bad.txt http://192.168.20.103:8899/pim
Benchmarking 192.168.20.103 (be patient).....done
Server Software: Jetty(8.1.16.v20140903)
Server Hostname: 192.168.20.103
Server Port: 8899
Document Path: /pim
Document Length: 33 bytes
Concurrency Level: 2
Time taken for tests: 30.24549 seconds
Complete requests: 10
Failed requests: 1
(Connect: 0, Length: 1, Exceptions: 0)
// obmited for clarity
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.9 1 2
Processing: 3 3005 9492.9 4 30023
Waiting: 2 3005 9492.7 3 30022
Total: 3 3006 9493.0 5 30024
Percentage of the requests served within a certain time (ms)
50% 5
66% 5
75% 7
80% 7
90% 30024
95% 30024
98% 30024
99% 30024
100% 30024 (longest request)
I don't believe jetty is this bad. Hopefully, it's just a configuration issue. This is the setting for my camel consumer URI:
"jetty:http://0.0.0.0:8899/pim?replyTimeout=70000&autoAck=false"
I am using akka 2.3.12 and camel-jetty 2.15.2

Jetty is certain not that bad and should be able to handle 10s of thousands of connections with many thousands of TPS.
Hard to diagnose from what you have said, other than Jetty does not send 503's when it is under load.... unless perhaps if the Denial of Service protection filter is deployed? (and ab would look like a DOS attack.... which it basically is and is not a great load generator for benchmarking).
So you need to track down who/what is sending that 503 and why.

It was my bad code: the sender (client) info was overwritten with overlapping requests. The 503 error message was sent due to Jetty continuation timeout.

Runtime#availableProcessors() doesn't return correct result on Linux server

I usually run Runtime#availableProcessors to determine how many cores on a Windows computer and it works fine. The result is consistent with that I found from control panel.
However when I applied the API on a Linux server, it returns 1. As I know the server is more powerful it doesn't make sense to me it's a single cpu system.
I did some search and found the Linux box is Intel(R) Xeon(R) CPU X5675 # 3.07GHz, googling shows it has 6 cpu cores.
Then the question is, why Runtime#availableProcessors misreported? Is it a bug?
Thanks,
John
Here is the entire output of /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 44
model name : Intel(R) Xeon(R) CPU X5675 # 3.07GHz
stepping : 2
cpu MHz : 3059.000
cache size : 12288 KB
fpu : yes
fpu_exception : yes
cpuid level : 11
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss syscall nx rdtscp lm constant_tsc up ida nonstop_tsc arat pni ssse3 cx16 sse4_1 sse4_2 popcnt lahf_lm
bogomips : 6118.00
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: [8]

It must be how your server is configured (perhaps you're running in a VM). When I run, on my personal Linux laptop,
public static void main(String[] args) {
System.out.println(Runtime.getRuntime().availableProcessors());
}
I get
4
as this machine has four cores. And cat /proc/cpuinfo reports the same.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.