Shenandoah GC pause time - java

I am using Shenandoah GC and using Java 8
"-XX:+UseShenandoahGC -XX:+UnlockExperimentalVMOptions -XX:+AlwaysPreTouch -XX:+UseNUMA -XX:-UseBiasedLocking -XX:+UseLargePages -XX:ShenandoahGCHeuristics=compact -XX:+ClassUnloadingWithConcurrentMark -Xloggc:/LOGS/CAS/gclog -XX:ShenandoahAllocationThreshold=20"
with this average pause time is 8 ms but max pause time is 160 ms. I dont want pause time to go beyond 25 ms. Is there any parameter to set which ensures that pause time shouldnt be go beyond it ?
Please help and suggest

Related

Java protobuf creating too many objects

I have a jetty server proxyfying betweena backend and converting protobufs frontend.proto->backend.proto and backend.proto->frontend.proto.
Requests take below 20ms(99th) and 40ms(99.9th) while load is not peaking.
However, when load peaks, the 99th increases +10 but the 99.9th increases by +60.
I have investigated it and the delayed requests are caused by GC Evacuation pauses, this I am sure of, this pauses take 50-70ms and run once every 15 seconds on valley load but jump up to once every 3-5 seconds on peak load, the duration is the same.
As soon as the GC frequency goes below 8-9 seconds the 99.9th percentile shoots up and I can see the debug logs of slow requests concurrently with the GC log.
I have profiled with JProfiler, Yourkit and VisualVM and saw that:
Eden space fills up and a GC pause is triggered
Very few of the objects are moved into Survivor(few MB out of 12G)
So most of the objects are already expired in Eden
This makes sense since requests take 30-40ms and most of the objects lifetime is tied to the request lifetime
Most of the objects are created during protobuf deserialization time
I've tried playing with GCPauseMillis and Eden sizes but nothing seems to make a difference, it never takes less than 50ms and bigger Eden means less frequency but much longer pauses
I see 2 options here:
Somehow reuse object creation in java-protobuf: Seems impossible, read through a lot of posts and mails and its not setup that way, they just say that "Java object allocation is very efficient and it should be able to handle many objects being created", and while that is true, the associated GC cost is killing my 99.9th
Make the GC run more often, say once a second, to bring collection time down so that it will stop more requests but for shorter times: I have been playing with GCMaxMillis and Eden sizes but I can't seem to get it lower than that
I uploaded the gc log to gc_log
Java version:
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
GC details:
-XX:CICompilerCount=12 -XX:ConcGCThreads=5 -XX:ErrorFile=/home/y/var/crash/hs_err_pid%p.log -XX:+FlightRecorder -XX:G1HeapRegionSize=4194304 -XX:GCLogFileSize=4194304 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/y/logs/yjava_jetty -XX:InitialHeapSize=12884901888 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=12884901888 -XX:MaxNewSize=7730102272 -XX:MinHeapDeltaBytes=4194304 -XX:NumberOfGCLogFiles=10 -XX:+ParallelRefProcEnabled -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UnlockCommercialFeatures -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC -XX:+UseGCLogFileRotation

G1GC very high GC count and CPU, very frequency GCs that kill performance

I've recently switched my Java application from CMS + ParNew to G1GC.
What I observed when I did the switch is the CPU usage went higher and the GC count + pause time went up as well.
My JVM flags before the switched were
java -Xmx22467m -Xms22467m -XX:NewSize=11233m -XX:+UseConcMarkSweepGC -XX:AutoBoxCacheMax=1048576 -jar my-application.jar
After the switch my flags are:
java -Xmx22467m -Xms22467m -XX:+G1GC -XX:AutoBoxCacheMax=1048576 -XX:MaxGCPauseMillis=30 -jar my-application.jar
I followed Oracle's Best Practices http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html
Do not Set Young Generation Size
And did not set the young generation size.
However I am suspecting that the young generation size is the problem here.
What I see is the heap usage is fluctuating between ~6 - 8 GB.
Whereas before, with CMS and Par New there the memory usage grew between 4-16 GB and only then I saw a GC:
I am not sure I understand why with G1GC the GC is so frequent. I am not sure what I'm missing when it comes to GC tuning with G1GC.
I'm using Java 8 :
ava version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
I appreciate your help.
UPDATE:
A bit more information about those pauses:
As you can see all those pauses are G1New, and seemingly they are as long as my target pause time, which is 30ms.
When I look at the ParNew pauses before the switch to G1GC, this is how it looked like:
So they are also all young gen collections (ParNew) but they are less frequent and shorter, because they happen only when the heap usage gets to around 14GB (according to the graph)
I am still clueless why the G1New happen so early (in terms of heap usage)
Update 2
I also noticed that NewRatio=2, I don't know if G1GC is respecting that, but that would mean that my New Gen is capped at 7GB. Could that be the reason?
Update 3
Adding G1GC GC logs:
https://drive.google.com/file/d/1iWqZCbB-nU6k_0-AQdvb6vaBSYbkQcqn/view?usp=sharing
I was able to see that the time spent in copying objects is very significant. Looks like G1GC has 15 generations by default before the object is promoted to Tenured Generation.
I reduced it to 1 (-XX:MaxTenuringThreshold=1)
Also I don't know how to confirm it in the logs, however visualizing the GC log I saw that the young generation is constantly being resized, from minimum size to maximum size. I narrowed down the range and that also improved the performance.
Looking here https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector-tuning.htm#JSGCT-GUID-70E3F150-B68E-4787-BBF1-F91315AC9AB9
I was trying to figure out if coarsenings is indeed an issue. But it simply says to set gc+remset=trace which I do not understand how to pass to java in command line, and if it's even available in JDK 8.
I increased the XX:G1RSetRegionEntries a bit just in case.
I hope it helps to the future G1GC tuner and if anyone else has more suggestions that would be great.
What I still see is that [Processed Buffers] is still taking a very long time in young evacuations, and [Scan RS] is very long in mixed collections.
Not sure why
Your GC log shows an average GC pause interval of 2 seconds with each around 30-40ms, which amounts to an application throughput of around 95%. That does not amount to "killing performance" territory. At least not due to GC pauses.
G1 does more concurrent work though, e.g. for remembered set refinement and your pauses seem to spend some time in update/scan RS, so I assume the concurrent GC threads are busy too, i.e. it may need additional CPU cycles outside GC pauses, which is not covered by the logs by default, you need +G1SummarizeRSetStats for that. If latency is more important you might want to allocated more cores to the machine, if throughput is more important you could tune G1 to perform more of the RS updates during the pauses (at the cost of increased pause times).

How to determine JVM GC throughput?

What's the simplest way to determine Oracle Java 8 JVM garbage collector throughput, preferably using JDK command line tools?
With the jstat command I can obtain total garbage collection time (GCT column). Based on comparing the changes in this value with GC logs, it seems that the GCT value output by jstat is the cumulative GC elapsed time in seconds (since JVM startup).
Is this correct?
So, can I calculate GC throughput like this?
1 - GCT / time_since_jvm_start
jstat could be used to obtain both the GC and time since JVM start using the following command:
jstat -gcutil -t <jvm-pid> 1000 1
You are correct in your question. The GCT column contains the total time the JVM was stopped to perform garbage collection, both in young GC and full GC.
You could use jstat as you write (jstat -gcutil -t <jvm-pid> 1000 1) and look at the first column to see the total time the JVM has been running. Let's call this uptime. Both this timestamp and the GC times are in seconds. If you then want to calculate the percentage of time not spent in GC you would do, exactly as you write:
1 - GCT / uptime
I would argue that calling this throughput is a bit misleading. For example if you use the CMS collector, GC happens in parallel with the application, lowering the application throughput while it does not actually stop the application.

What causes increase in vmop time during safepoint application thread pause, while performing GC

I'm running Java7 (Java HotSpot(TM) 64-Bit Server VM (build 24.76-b04, mixed mode) on a Linux Server 2.6.32-504.el6.x86_64 (RHEL); with few GC switches enabled as shown below.
The problem appears to be a significant increase in time while pausing application thread (> 3Sec); And based on the safepointstatistics it appear to be related to vmop operations.
I have observed there isn't much overhead due to GC nor any allocation failures, only minor collections take place during program execution. The GC log pasted below contains reference from the GC just before the application thread pausing took longer than 3Sec and the GC showing the actual delay.
Questions
Could this time sink be related to the Server freezing up or not being responsive, this is based on the assumption real time took 3.02 Sec and there has been no indication of any overhead due to GC. ([Times: user=0.02 sys=0.00, real=3.02 secs])
Is there any utility available that would monitor a systems responsiveness, or is there any recommended algorithm that could be used to measure the server responsivness
What causes increase in vmop time?
Does JVM perform any Disk IO while initiating garbage collection; in otherwords, before pausing the application thread at a safepoint, does JVM perform any diskIO; or can a system with high diskIO activity during GC cause the delay in pausing application thread.
Server Configurations:
Please note there are several applications running on this server, this is not a dedicated server for the mentioned application.
model name: Intel(R) Xeon(R) CPU X5365 # 3.00GHz / 8 Core
total used free shared buffers cached
Mem: 24602892 22515868 2087024 244 165796 10801380
-/+ buffers/cache: 11548692 13054200'
GC Options enabled:
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -Xloggc:/opt/swxsmf_fep/working/gk-gc-CMS.log -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCApplicationStoppedTime\
-XX:+PrintSafepointStatistics -XX:PrintSafepointStatisticsCount=1 -XX:+PrintGCApplicationConcurrentTime
Previous GC (showing no issues)
2015-04-08T19:05:24.622+0100: 522569.387: Application time: 16.4710580 seconds
2015-04-08T19:05:24.622+0100: 522569.387: [GC2015-04-08T19:05:24.622+0100: 522569.387: [ParNew: 102798K->79K(115456K), 0.0018020 secs] 105218K->2499K(371776K), 0.0019090 secs] [Times: user=0.02 sys=0.00, rea
l=0.00 secs]
2015-04-08T19:05:24.624+0100: 522569.389: Total time for which application threads were stopped: 0.0021910 seconds
GC where real time > 3 Sec
vmop [threads: total initially_running wait_to_block] [time: spin block sync cleanup vmop] page_trap_count
522588.500: GenCollectForAllocation [ 22 0 0 ] [ 0 0 0 0 3019 ] 0
2015-04-08T19:05:43.747+0100: 522588.512: Application time: 19.1232430 seconds
2015-04-08T19:05:43.748+0100: 522588.512: [GC2015-04-08T19:05:46.765+0100: 522591.530: [ParNew: 102735K->77K(115456K), 0.0017640 secs] 105155K->2497K(371776K), 3.0195450 secs] [Times: user=0.02 sys=0.00, real=3.02 secs]
2015-04-08T19:05:46.767+0100: 522591.532: Total time for which application threads were stopped: 3.0198060 seconds
Any input on this would be very much appreciated, please let me know if you require any futher details.
Time spent stopping threads is generally time your app isn't responding. So yes, I'd expect to see the app hang.
Have you tried jhiccup?
(And 4.) Here's something that came to mind: http://www.evanjones.ca/jvm-mmap-pause.html. It describes a stall ("real" pause) writing hsperf data during GC. There's also a repro case you can try on your machines.

Optimizing Tomcat / Garbage Collection

Our server has 128GB of RAM and 64 cores, running Tomcat 7.0.30 and Oracle jdk1.6.0_38, on CentOS 6.3.
Every 60 minutes we're seeing garbage collection that was taking 45 - 60 seconds. Adding -XX:-UseConcMarkSweepGC increased page load times by about 10% but got that down to about 3 seconds, which is an acceptable trade-off.
Our config:
-Xms30g
-Xmx30g
-XX:PermSize=8g
-XX:MaxPermSize=8g
-Xss256k
-XX:-UseConcMarkSweepGC
We set the heap at 30 GB to keep 32 bit addressing (I read that above 32 GB the 64 bit addressing takes up more memory, so you have to go to about 48 GB to see improvements).
Using VisualGC I can see that the Eden space is cycling through every 30 - 60 minutes, but not much happens with the Survivor 0, Survivor 1, Old Gen, and Perm Gen.
We have a powerful server. What other optimizations can we make to further decrease the 3 second GC time?
Any recommendations to improve performance or scaling?
Any other output or config info that would help?
It might sound counter-intuitive, but have you tried allocating a lot less memory? E.g. do you really need a 30G heap? In case you can get along with 4G or even less: Garbage collection might be more frequent, but when it happens it will be a lot faster. Typically I find this more desirable than allocating a lot of memory, suffering from the time it takes to clean it up.
Even if this will not help you because you really need 30G of memory, others might come along with a similar problem and they might benefit from allocating less.
Seems that you need Incremental GC to reduce pauses:
-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
and for tracing without visualgc this always went well for me (output in catalina.out):
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps
2013-01-05T22:52:13.954+0100: 15918369.557: [GC 15918369.557: [DefNew:
65793K->227K(98304K), 0.0031220 secs] 235615K->170050K(491520K),
0.0033220 secs] [Times: user=0.01 sys=0.00, real=0.00 secs]
After you can play with this:
-XX:NewSize=ABC -XX:MaxNewSize=ABC
-XX:SurvivorRatio=ABC
-XX:NewRatio=ABC
Reference: Virtual Machine Garbage Collection Tuning

Categories

Resources