What's the simplest way to determine Oracle Java 8 JVM garbage collector throughput, preferably using JDK command line tools?
With the jstat command I can obtain total garbage collection time (GCT column). Based on comparing the changes in this value with GC logs, it seems that the GCT value output by jstat is the cumulative GC elapsed time in seconds (since JVM startup).
Is this correct?
So, can I calculate GC throughput like this?
1 - GCT / time_since_jvm_start
jstat could be used to obtain both the GC and time since JVM start using the following command:
jstat -gcutil -t <jvm-pid> 1000 1
You are correct in your question. The GCT column contains the total time the JVM was stopped to perform garbage collection, both in young GC and full GC.
You could use jstat as you write (jstat -gcutil -t <jvm-pid> 1000 1) and look at the first column to see the total time the JVM has been running. Let's call this uptime. Both this timestamp and the GC times are in seconds. If you then want to calculate the percentage of time not spent in GC you would do, exactly as you write:
1 - GCT / uptime
I would argue that calling this throughput is a bit misleading. For example if you use the CMS collector, GC happens in parallel with the application, lowering the application throughput while it does not actually stop the application.
Related
We are performing performance testing and tuning activities in of our projects. I have used JVM configs mentioned in this article
Exact JVM options are:
set "JAVA_OPTS=-Xms1024m -Xmx1024m
-XX:MetaspaceSize=512m -XX:MaxMetaspaceSize=1024m
-XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=50
-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCDateStamps
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCApplicationConcurrentTime
-XX:+PrintHeapAtGC -Xloggc:C:\logs\garbage_collection.logs
-XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=10
-XX:GCLogFileSize=100m -XX:+HeapDumpOnOutOfMemoryError
-XX:HeapDumpPath=C:\logs\heap_dumps\'date'.hprof
-XX:+UnlockDiagnosticVMOptions"
Still we see that the issue is not resolved. I am sure that there are some issues within our code(Thread implementation etc) and the external libraries that we use(like log4j etc) but I was at least hoping some performance improvement with employing these JVM tuning options.
The reports from Gceasy.io suggest that:
It looks like your application is waiting due to lack of compute
resources
(either CPU or I/O cycles). Serious production applications shouldn't be
stranded because of compute resources. In 1 GC event(s), 'real' time took
more than 'usr' + 'sys' time.
Some known code issues:
There is lot of network traffic to some external webapp which accepts only one
connection at a time. But this delay is acceptable for our application.
Some of threads block on Log4j. We are using Log4j for console, db and file appending.
There can be issue with MySQL tuning as well. But for now, we want to rule out these possibilities and just understand any other factors that might be affecting our execution.
What I was hoping with the tuning that, there should be less GC activity, metaspace should be managed properly. But this is not observed why?
Here are some of the snapshots:
Here we can how metaspace is stuck at 40MB and do not exceed that.
There is a lot of GC activity also been seen
Another image depicting overall system state:
What could be our issue? Need some definitive pointers on these!
UPDATE-1: Disk usage monitoring
UPDATE-2: Added the screenshot with heap.
SOME MORE UPDATES: Well, I did not mention earlier that our processing involves selenium (Test automation) execution which spawns more than couple of web-browsers using the chrome/ firefox webdrivers. While monitoring I saw that in the background processes, Chrome is using a lot of memory. Can this be a possible reason for slow down?
Here are the screenshots for the same:
Other pic that shows the background processes
EDIT No-5: Adding the GC logs
GC_LOGS_1
GC_LOGS_2
Thanks in advance!
You don't seem to have a GC problem. Here's a plot of your GC pause times over the course of more than 40 hours of your app running:
From this graph we can see that most of the GC pause times are below 0.1 seconds, some of them are in the 0.2-0.4 seconds, but since the graph itself contains 228000 data points, it's hard to figure out how the data is distributed. We need a histogram plot containing the distribution of the GC pause times. Since the vast majority of these GC pause times are very low, with a very few outliers, plotting the distribution in a histogram linearly is not informative. So I created a plot containing the distribution of the logarithm of those GC pause times:
In the above image, the X axis is the 10 base logarithm of the GC pause time, the Y axis is the number of occurences. The histogram has 500 bins.
As you can see from these two graphs, the GC pause times are clustered into two groups, and most of the GC pause times are very low on the order of magnitude of milliseconds or less. If we plot the same histogram on a log scale on the y axis too, we get this graph:
In the above image, the X axis is the 10 base logarithm of the GC pause time, the Y axis is the 10 based logarithm of the number of occurences. The histogram has 50 bins.
On this graph it becomes visible, that we you have a few tens of GC pause times that might be measurable for a human, which are in the order of magnitude of tenths of seconds. These are probably those 120 full GCs that you have in your first log file. You might be able to reduce those times further if you were using a computer with more memory and disabled swap file, so that all of the JVM heap stays in RAM. Swapping, especially on a non-SSD drive can be a real killer for the garbage collector.
I created the same graphs for the second log file you posted, which is a much smaller file spanning of around 8 minutes of time, consisting of around 11000 data points, and I got these images:
In the above image, the X axis is the 10 base logarithm of the GC pause time, the Y axis is the number of occurences. The histogram has 500 bins.
In the above image, the X axis is the 10 base logarithm of the GC pause time, the Y axis is the 10 based logarithm of the number of occurences. The histogram has 50 bins.
In this case, since you've been running the app on a different computer and using different GC settings, the distribution of the GC pause times is different from the first log file. Most of them are in the sub-millisecond range, with a few tens, maybe hundreds in the hundredth of a second range. We also have a few outliers here that are in the 1-2 seconds range. There are 8 such GC pauses and they all correspond to the 8 full GCs that occured.
The difference between the two logs and the lack of high GC pause times in the first log file might be attributed to the fact that the machine running the app that produced the first log file has double the RAM vs the second (8GB vs 4GB) and the JVM was also configured to run the parallel collector. If you're aiming for low latency, you're probably better off with the first JVM configuration as it seems that the full GC times are consistently lower than in the second config.
It's hard to tell what your issue is with your app, but it seems it's not GC related.
First thing I will check is Disk IO... If your processor is not loaded 100% during performance testing most likely Disk IO is a problem(e.g. you are using hard drive)... Just switch for SSD(or in-memory disk) to resolve this
GC just does its work... You re selected concurrent collector to perform GC.
From the documentation:
The mostly concurrent collector performs most of its work concurrently (for example, while the application is still running) to keep garbage collection pauses short. It is designed for applications with medium-sized to large-sized data sets in which response time is more important than overall throughput because the techniques used to minimize pauses can reduce application performance.
What you see matches this description: GC takes time, but "mainly" do not pause application for a long time
As an option you may try to enable Garbage-First Garbage Collector (use -XX:+UseG1GC) and compare results. From the docs:
G1 is planned as the long-term replacement for the Concurrent Mark-Sweep Collector (CMS). Comparing G1 with CMS reveals differences that make G1 a better solution. One difference is that G1 is a compacting collector. Also, G1 offers more predictable garbage collection pauses than the CMS collector, and allows users to specify desired pause targets.
This collector allows to set maximum GC phase length, e.g. add -XX:MaxGCPauseMillis=200 option, which says that you're OK until GC phase takes less than 200ms.
Check you log files. I have seen similar issue in production recently and guess what was the problem. Logger.
We use log4j non asysnc but it is not log4j issue. Some exception or condition led to log around a million lines in the log file in span of 3 minutes. Coupled with high volume and other activities in the system, that led to high disk I/O and web application became unresponsive.
I've recently switched my Java application from CMS + ParNew to G1GC.
What I observed when I did the switch is the CPU usage went higher and the GC count + pause time went up as well.
My JVM flags before the switched were
java -Xmx22467m -Xms22467m -XX:NewSize=11233m -XX:+UseConcMarkSweepGC -XX:AutoBoxCacheMax=1048576 -jar my-application.jar
After the switch my flags are:
java -Xmx22467m -Xms22467m -XX:+G1GC -XX:AutoBoxCacheMax=1048576 -XX:MaxGCPauseMillis=30 -jar my-application.jar
I followed Oracle's Best Practices http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html
Do not Set Young Generation Size
And did not set the young generation size.
However I am suspecting that the young generation size is the problem here.
What I see is the heap usage is fluctuating between ~6 - 8 GB.
Whereas before, with CMS and Par New there the memory usage grew between 4-16 GB and only then I saw a GC:
I am not sure I understand why with G1GC the GC is so frequent. I am not sure what I'm missing when it comes to GC tuning with G1GC.
I'm using Java 8 :
ava version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
I appreciate your help.
UPDATE:
A bit more information about those pauses:
As you can see all those pauses are G1New, and seemingly they are as long as my target pause time, which is 30ms.
When I look at the ParNew pauses before the switch to G1GC, this is how it looked like:
So they are also all young gen collections (ParNew) but they are less frequent and shorter, because they happen only when the heap usage gets to around 14GB (according to the graph)
I am still clueless why the G1New happen so early (in terms of heap usage)
Update 2
I also noticed that NewRatio=2, I don't know if G1GC is respecting that, but that would mean that my New Gen is capped at 7GB. Could that be the reason?
Update 3
Adding G1GC GC logs:
https://drive.google.com/file/d/1iWqZCbB-nU6k_0-AQdvb6vaBSYbkQcqn/view?usp=sharing
I was able to see that the time spent in copying objects is very significant. Looks like G1GC has 15 generations by default before the object is promoted to Tenured Generation.
I reduced it to 1 (-XX:MaxTenuringThreshold=1)
Also I don't know how to confirm it in the logs, however visualizing the GC log I saw that the young generation is constantly being resized, from minimum size to maximum size. I narrowed down the range and that also improved the performance.
Looking here https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector-tuning.htm#JSGCT-GUID-70E3F150-B68E-4787-BBF1-F91315AC9AB9
I was trying to figure out if coarsenings is indeed an issue. But it simply says to set gc+remset=trace which I do not understand how to pass to java in command line, and if it's even available in JDK 8.
I increased the XX:G1RSetRegionEntries a bit just in case.
I hope it helps to the future G1GC tuner and if anyone else has more suggestions that would be great.
What I still see is that [Processed Buffers] is still taking a very long time in young evacuations, and [Scan RS] is very long in mixed collections.
Not sure why
Your GC log shows an average GC pause interval of 2 seconds with each around 30-40ms, which amounts to an application throughput of around 95%. That does not amount to "killing performance" territory. At least not due to GC pauses.
G1 does more concurrent work though, e.g. for remembered set refinement and your pauses seem to spend some time in update/scan RS, so I assume the concurrent GC threads are busy too, i.e. it may need additional CPU cycles outside GC pauses, which is not covered by the logs by default, you need +G1SummarizeRSetStats for that. If latency is more important you might want to allocated more cores to the machine, if throughput is more important you could tune G1 to perform more of the RS updates during the pauses (at the cost of increased pause times).
when I run "jmap -heap ", i get the following
If you notice, PS Old Generation is just a little over 9% used while Eden is ~4.5% used.
At what percentage of Eden does minor GC occur?
At what percentage of PS Old generation does the stop the world GC occurs?
Exact percentage would vary based on algorithm you are using.
Refer https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/
Best solution : go to your java bin ex: java-1.8.0-oracle/bin.
and run jstat -gc -t PID 1s and see yourself when major are minor gc are getting triggered.
It will give you current memory size and GC occurrence count for all major and minor GC parameters.
refer the attachment for example.
enter image description here
Keeping it simple :
Minor GC is triggered when JVM does not have space for a new Object example :Eden is getting full.
Try the command : jstat -gc -t 5000 1s see behaviors of young generation.
Regarding At what percentage of PS Old generation does the stop the world GC occurs?
Even Minor GC stops application threads leading to stop the world.(but its comparatively negligible)
Stop the world event occur every time, when minor or major GC or full gc is triggered and to check when full GC will triggered.
Major GC is cleaning the Tenured space.
Full GC is cleaning the entire Heap – both Young and Tenured spaces.
You can set flags to modify them depending on algorithm you are using for GC.
I Am experiencing a regular high minor GC pause times(~ 9seconds).
The application is a server written in Java, executing 3 transactions/seconds.
Eventhough there's no I/O excessive activity
Heap parameters are:
-Xms1G
-Xmx14G
-XX:+UseConcMarkSweepGC
-XX:+DisableExplicitGC
-XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps
-XX:+PrintGCDetails
What are the possible reasons for such minor gc pause times values?
As the other answer says, without GC log snippets it's not possible to answer this definitively. Some things to think about:
Assuming no impact from the underlying OS (scheduling, CPU thrashing), the time taken for a minor collection will be proportional to the amount of live data in the young gen. when the collector runs (every live object in the young gen. gets copied during a minor GC). Looking at the graph of your old gen. you are seeing consistent growth which would indicate you are promoting significant amounts of data during minor GC. You're either creating a lot of long-lived objects or you're maintaining references unnecessarily.
To reduce the pauses you could try reducing the size of the Eden space (so there is less potential data to copy on each minor GC) and also reduce the tenuring threshold so that objects get moved out of the survivor spaces more quickly. The downside of this will be your minor GCs will happen more frequently so you will probably see a degradation in throughput.
I would also change the -Xms value. You clearly need more than 1Gb in your heap so it would be best to set it to 14Gb to avoid the heap having to be resized by the JVM as the amount of data increases.
For questions in category "Why my GC pause that long?" you should always provide some snippets for GC logs.
As a pure speculation, here are few reasons why minor GC may be abnormally slow:
JVM were suspended from execution by OS (e.i. CPU starvation, swapping, virtual server freeze)
There are some issue with putty JVM at safepoint (unlikely looking at your pause pattern though)
Object survival spikes
Reference object processing overhead (you need add -XX:+PrintReferebceGC to get reference processing info into GC log)
Try using
-XX:+UseG1GC -XX:MaxGCPauseMillis=1000
This will try to keep max GC pause below 1s.
You need to assign enough memory using -Xmx and set the MaxGCPauseMillis as per need.
IMHO:
There is no sufficient evidence that there is indeed minor GC "pause time". What is shown is Garbage Collection Time, and GC time != GC Pause time. Garbage collection activity time and garbage collection pause time are two different beasts - or in technical words, I should say that these are JVM ergonomics/performance goals.
Minor GC pause time is commonly negilible and it is the major GC pause time that affects the application responsiveness.
There is something called as JVM/Ergonomics goals, and "Throughput Goal" is one of the three goals, so I think in this case at max what can be said is that throughput goal is not going very good for young generation as lot of time is spent on GC.
I have a jetty application processing about 2k requests a second. Machine has 8 cores and JVM heap size is 8GB. There are a lot of memory mapped files and internal caching so that takes up most of the heap space (4.5 GB).
Here are the stats after the application is stable and the JVM is done tuning Young and Old gen spaces:
Young Generation : 2.6GB
Old Generation : 5.4GB
I'm seeing that my young GC is invoked every 3 seconds and the entire Eden space is cleared (i.e. very less data is passed onto old generation). I understand that filling up the Young generation so quickly means I'm allocating way too many objects and that this is an issue. But there is definitely no memory leak in my application, since the servers have been up since 2 weeks with no OOM crashes.
Young GC is a stop the world event. So my understanding is that all threads are paused during this time. So when I monitor latencies from the logs, I can see that every 2-3 seconds about 6-9 requests have a response time of > 100ms (My average response time is < 10 ms). And when Full GC is called, I see that 6-9 reqeusts have a response time of > 3 seconds (That's how long Full GC takes and since it's invoked very very less, it is not an issue here)
My question is since my jetty application has a 200 size threadpool and no bounded request queue, shouldn't calling young GC have an accordion effect on my response times? Will a 100 ms buffer be added to all the requests in my queue?
If so, what is the best way to measure response times from being added to the queue to the output response? Because the 6-9 request thing I mentioned above is from checking the logs. So basically, when the application logic is invoked to just before the response is sent, I maintain start and end time variables and subtract these 2 and dump it to the logs.
One way would be to check my load balancer. But since these servers are behind an ELB, I don't really have much access here other than average response times which don't really help me.
You should enable GC logging for your application. Try adding following jvm command line arguments
-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:-PrintTenuringDistribution -XX:+PrintGCCause -XX:+PrintGCApplicationStoppedTime -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=<REPLACE_ME> -XX:GCLogFileSize=20M -Xloggc:<path_to_gc_log_dir>/gc.log
Then look at the events in GC logs and try to correlate it with application logs