I have a jetty server proxyfying betweena backend and converting protobufs frontend.proto->backend.proto and backend.proto->frontend.proto.
Requests take below 20ms(99th) and 40ms(99.9th) while load is not peaking.
However, when load peaks, the 99th increases +10 but the 99.9th increases by +60.
I have investigated it and the delayed requests are caused by GC Evacuation pauses, this I am sure of, this pauses take 50-70ms and run once every 15 seconds on valley load but jump up to once every 3-5 seconds on peak load, the duration is the same.
As soon as the GC frequency goes below 8-9 seconds the 99.9th percentile shoots up and I can see the debug logs of slow requests concurrently with the GC log.
I have profiled with JProfiler, Yourkit and VisualVM and saw that:
Eden space fills up and a GC pause is triggered
Very few of the objects are moved into Survivor(few MB out of 12G)
So most of the objects are already expired in Eden
This makes sense since requests take 30-40ms and most of the objects lifetime is tied to the request lifetime
Most of the objects are created during protobuf deserialization time
I've tried playing with GCPauseMillis and Eden sizes but nothing seems to make a difference, it never takes less than 50ms and bigger Eden means less frequency but much longer pauses
I see 2 options here:
Somehow reuse object creation in java-protobuf: Seems impossible, read through a lot of posts and mails and its not setup that way, they just say that "Java object allocation is very efficient and it should be able to handle many objects being created", and while that is true, the associated GC cost is killing my 99.9th
Make the GC run more often, say once a second, to bring collection time down so that it will stop more requests but for shorter times: I have been playing with GCMaxMillis and Eden sizes but I can't seem to get it lower than that
I uploaded the gc log to gc_log
Java version:
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
GC details:
-XX:CICompilerCount=12 -XX:ConcGCThreads=5 -XX:ErrorFile=/home/y/var/crash/hs_err_pid%p.log -XX:+FlightRecorder -XX:G1HeapRegionSize=4194304 -XX:GCLogFileSize=4194304 -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/y/logs/yjava_jetty -XX:InitialHeapSize=12884901888 -XX:MarkStackSize=4194304 -XX:MaxHeapSize=12884901888 -XX:MaxNewSize=7730102272 -XX:MinHeapDeltaBytes=4194304 -XX:NumberOfGCLogFiles=10 -XX:+ParallelRefProcEnabled -XX:+PrintGC -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+UnlockCommercialFeatures -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseFastUnorderedTimeStamps -XX:+UseG1GC -XX:+UseGCLogFileRotation
Related
I've recently switched my Java application from CMS + ParNew to G1GC.
What I observed when I did the switch is the CPU usage went higher and the GC count + pause time went up as well.
My JVM flags before the switched were
java -Xmx22467m -Xms22467m -XX:NewSize=11233m -XX:+UseConcMarkSweepGC -XX:AutoBoxCacheMax=1048576 -jar my-application.jar
After the switch my flags are:
java -Xmx22467m -Xms22467m -XX:+G1GC -XX:AutoBoxCacheMax=1048576 -XX:MaxGCPauseMillis=30 -jar my-application.jar
I followed Oracle's Best Practices http://www.oracle.com/technetwork/tutorials/tutorials-1876574.html
Do not Set Young Generation Size
And did not set the young generation size.
However I am suspecting that the young generation size is the problem here.
What I see is the heap usage is fluctuating between ~6 - 8 GB.
Whereas before, with CMS and Par New there the memory usage grew between 4-16 GB and only then I saw a GC:
I am not sure I understand why with G1GC the GC is so frequent. I am not sure what I'm missing when it comes to GC tuning with G1GC.
I'm using Java 8 :
ava version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
I appreciate your help.
UPDATE:
A bit more information about those pauses:
As you can see all those pauses are G1New, and seemingly they are as long as my target pause time, which is 30ms.
When I look at the ParNew pauses before the switch to G1GC, this is how it looked like:
So they are also all young gen collections (ParNew) but they are less frequent and shorter, because they happen only when the heap usage gets to around 14GB (according to the graph)
I am still clueless why the G1New happen so early (in terms of heap usage)
Update 2
I also noticed that NewRatio=2, I don't know if G1GC is respecting that, but that would mean that my New Gen is capped at 7GB. Could that be the reason?
Update 3
Adding G1GC GC logs:
https://drive.google.com/file/d/1iWqZCbB-nU6k_0-AQdvb6vaBSYbkQcqn/view?usp=sharing
I was able to see that the time spent in copying objects is very significant. Looks like G1GC has 15 generations by default before the object is promoted to Tenured Generation.
I reduced it to 1 (-XX:MaxTenuringThreshold=1)
Also I don't know how to confirm it in the logs, however visualizing the GC log I saw that the young generation is constantly being resized, from minimum size to maximum size. I narrowed down the range and that also improved the performance.
Looking here https://docs.oracle.com/javase/9/gctuning/garbage-first-garbage-collector-tuning.htm#JSGCT-GUID-70E3F150-B68E-4787-BBF1-F91315AC9AB9
I was trying to figure out if coarsenings is indeed an issue. But it simply says to set gc+remset=trace which I do not understand how to pass to java in command line, and if it's even available in JDK 8.
I increased the XX:G1RSetRegionEntries a bit just in case.
I hope it helps to the future G1GC tuner and if anyone else has more suggestions that would be great.
What I still see is that [Processed Buffers] is still taking a very long time in young evacuations, and [Scan RS] is very long in mixed collections.
Not sure why
Your GC log shows an average GC pause interval of 2 seconds with each around 30-40ms, which amounts to an application throughput of around 95%. That does not amount to "killing performance" territory. At least not due to GC pauses.
G1 does more concurrent work though, e.g. for remembered set refinement and your pauses seem to spend some time in update/scan RS, so I assume the concurrent GC threads are busy too, i.e. it may need additional CPU cycles outside GC pauses, which is not covered by the logs by default, you need +G1SummarizeRSetStats for that. If latency is more important you might want to allocated more cores to the machine, if throughput is more important you could tune G1 to perform more of the RS updates during the pauses (at the cost of increased pause times).
I have a program which receives UDP packets, parses some data from them, and saves it to a DB, in multiple threads. It uses Hibernate and Spring via Grails (GORM stand-alone).
It works OK in one server, it starts fast (20-30 ms per packet -except for the very first ones as JIT kicks in-) and after a while stabilizes at 50-60 ms.
However, in a newer, more powerful server it starts fast but gradually gets slower and slower (it reaches 200 ms or even 300 ms per packet, always with the same load). And then, when the JVM performs a full GC (or I do it manually from Visual VM), it gets fast again and the cycle starts over.
Any ideas about what could cause this behaviour? It seems to be getting slower as the Old Gen fills up. Eden fills up quite fast, but GCs pauses seem to be short. And it works OK in the old server, so it's puzzling me.
Servers and settings:
The servers specs are:
Old server: Intel Xeon E3-1245 V2 # 3.40GHz, 32 GB RAM without ECC
New server: Intel Xeon E5-1620 # 3.60GHz, 64 GB RAM with ECC
OS: Debian 7.6
JVM:
java version "1.7.0_65"
Java(TM) SE Runtime Environment (build 1.7.0_65-b17)
Java HotSpot(TM) 64-Bit Server VM (build 24.65-b04, mixed mode)
JVM settings:
Old server: was running with no special RAM or GC params, PrintFlagsFinal gives: -XX:InitialHeapSize=525445120 -XX:+ManagementServer -XX:MaxHeapSize=8407121920 -XX:+UseCompressedOops -XX:+UseParallelGC
New server: tried forcing those same flags, same results.
Old server seemed to support UseFastStosb, it was enabled by default. Forcing it in the new server results in a message that says it's not supported.
Can you try to use G1 which is supported by your JVM version?
Applications running with either the CMS or the Parallel Old GC garbage collector would benefit switching to G1 if the application has one or more of the following traits.
(1) Full GC durations are too long or too frequent.
(2) The rate of object allocation rate or promotion varies significantly.
(3) Undesired long garbage collection or compaction pauses (longer than 0.5 to 1 second)
I cannot possibly say if there's anything wrong with your application / the server VM defaults.
Try adding -XX:+PrintGCDetails to know more about young and old generation sizes at the time of garbage collection. According the values, ur initial heap size starts around 525MB and maximum heap is around 8.4GB. The JVM will resize the heap based on the requirement and everytime it resizes this heap, all the young and old generations are resized accordingly which will cause a Full GC.
Also your flags indicate UseParallelGC which will do the young generation collection using multiple threads but old gen is still serially collected using single thread.
The default value of NewRatio is 2 which means that Young gen takes 1/3 of the heap and old gen takes 2/3 of the heap. If you have too many short living objects try resizing young gen size and probably give G1 GC a try now that u're using 7u65.
But before tuning I strongly recommend you to
(1) Do proper analysis taking GC logs - see if there are any Full GCs during your slow response times
(2) Try Java Mission Control. Use it to monitor your remote server process. Its feature rich and u'll know more information about GC.
You can use -XX:+PrintGCDetails option to see how frequency each GC occurs.
However, I don't think it is GC issue(or GC papameters). As what is said in your post, the program runs OK but problem occurs when it is moved to a new fast machines. My guess is that there is some bottleneck in your program, which in turn slows release references to allocated objects. Consequently, the memory accumlates and VM use a lot of time for GC and memory allocation.
In other means, the procedure who allocates heap memory for package process, and the consumer recycle this memory after packages memory are saved to DB. But the consumer can not catch up procedure speed.
So my suggestion is to check your program and doing some measurements
We have a production web application running on our intranet which:
is restarted at 0300 each day in order to perform a backup of its database
has the same load on it throughout the working day (0800 to 1700)
is running on Java HotSpot(TM) 64-Bit Server VM version 20.45-b01
is running on a physical machine with 16 cores and 32 gigs of RAM, running Linux 2.6.18-128.el5
does not share the machine with any other significant process
is configured with:
-Xms2g
-XX:PermSize=256m
-Xmx4g
-XX:MaxPermSize=256m
-Xss192k
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSParallelRemarkEnabled
-XX:CMSInitiatingOccupancyFraction=50
-XX:+DisableExplicitGC
Each day the heap usage:
rises gradually to 90% from startup until 0800
remains at 90% until 0930
remains at 70% from 0930 until 1415
drops to 50% at 1415
drops to 37% at 1445
at which point the heap rises to 55% in about 40 minutes and is collected back to 37%, ad infinitum until the next restart.
We have AppDynamics installed on the JVM and can see that Major Garbage Collections take place roughly every minute without much of an impact on the memory (except the falls outlined above of course) until the memory reaches 37%, when the Major collections become much less frequent.
There are obviously hundreds of factors external to the behaviour of a web application, but one avenue of research is the fact that Hotspot JIT information is obviously lost when the JVM is stopped.
Are there GC optimisations/etc which are also lost with the shutdown of the JVM? Is the JVM effectively consuming more memory than it needs to because certain Hotspot optimisations haven't yet taken place?
Is it possible that we would get better memory performance from this application if the JVM wasn't restarted and we found another way to perform a backup of the database?
(Just to reiterate, I know that there are a hundred thousand things that could influence the behaviour of an application, especially an application that hardly anyone else knows! I really just want to know whether there are certain things to do with the memory performance of a JVM which are lost when it is stopped)
Yes, it is possible for the behaviour of GCs to change over time due to JIT optimisation. One example is 'Escape Analysis' which has been enabled by default since Java 6u23. This type of optimisation can prevent some objects from being created in the heap and therefore not require garbage collection at all.
For more information see Java 7's HotSpot Performance Enhancements.
We have a fairly big application running on a JBoss 7 application server. In the past, we were using ParallelGC but it was giving us trouble in some servers where the heap was large (5 GB or more) and usually nearly filled up, we would get very long GC pauses frequently.
Recently, we made improvements to our application's memory usage and in a few cases added more RAM to some of the servers where the application runs, but we also started switching to G1 in the hopes of making these pauses less frequent and/or shorter. Things seem to have improved but we are seeing a strange behaviour which did not happen before (with ParallelGC): the Perm Gen seems to fill up pretty quickly and once it reaches the max value a Full GC is triggered, which usually causes a long pause in the application threads (in some cases, over 1 minute).
We have been using 512 MB of max perm size for a few months and during our analysis the perm size would usually stop growing at around 390 MB with ParallelGC. After we switched to G1, however, the behaviour above started happening. I tried increasing the max perm size to 1 GB and even 1,5 GB, but still the Full GCs are happening (they are just less frequent).
In this link you can see some screenshots of the profiling tool we are using (YourKit Java Profiler). Notice how when the Full GC is triggered the Eden and the Old Gen have a lot of free space, but the Perm size is at the maximum. The Perm size and the number of loaded classes decrease drastically after the Full GC, but they start rising again and the cycle is repeated. The code cache is fine, never rises above 38 MB (it's 35 MB in this case).
Here is a segment of the GC log:
2013-11-28T11:15:57.774-0300: 64445.415: [Full GC 2126M->670M(5120M), 23.6325510 secs]
[Eden: 4096.0K(234.0M)->0.0B(256.0M) Survivors: 22.0M->0.0B Heap: 2126.1M(5120.0M)->670.6M(5120.0M)]
[Times: user=10.16 sys=0.59, real=23.64 secs]
You can see the full log here (from the moment we started up the server, up to a few minutes after the full GC).
Here's some environment info:
java version "1.7.0_45"
Java(TM) SE Runtime Environment (build 1.7.0_45-b18)
Java HotSpot(TM) 64-Bit Server VM (build 24.45-b08, mixed mode)
Startup options: -Xms5g -Xmx5g -Xss256k -XX:PermSize=1500M -XX:MaxPermSize=1500M -XX:+UseG1GC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintAdaptiveSizePolicy -Xloggc:gc.log
So here are my questions:
Is this the expected behaviour with G1? I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...
Is there something I can improve/corrrect in our startup parameters? The server has 8 GB of RAM, but it doesn't seem we are lacking hardware, performance of the application is fine until a full GC is triggered, that's when users experience big lags and start complaining.
Causes of growing Perm Gen
Lots of classes, especially JSPs.
Lots of static variables.
There is a classloader leak.
For those that don't know, here is a simple way to think about how the PremGen fills up. The Young Gen doesn't get enough time to let things expire and so they get moved up to Old Gen space. The Perm Gen holds the classes for the objects in the Young and Old Gen. When the objects in the Young or Old Gen get collected and the class is no longer being referenced then it gets 'unloaded' from the Perm Gen. If the Young and Old Gen don't get GC'd then neither does the Perm Gen and once it fills up it needs a Full stop-the-world GC. For more info see Presenting the Permanent Generation.
Switching to CMS
I know you are using G1 but if you do switch to the Concurrent Mark Sweep (CMS) low pause collector -XX:+UseConcMarkSweepGC, try enabling class unloading and permanent generation collections by adding -XX:+CMSClassUnloadingEnabled.
The Hidden Gotcha'
If you are using JBoss, RMI/DGC has the gcInterval set to 1 min. The RMI subsystem forces a full garbage collection once per minute. This in turn forces promotion instead of letting it get collected in the Young Generation.
You should change this to at least 1 hr if not 24 hrs, in order for the the GC to do proper collections.
-Dsun.rmi.dgc.client.gcInterval=3600000 -Dsun.rmi.dgc.server.gcInterval=3600000
List of every JVM option
To see all the options, run this from the cmd line.
java -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal -version
If you want to see what JBoss is using then you need to add the following to your standalone.xml. You will get a list of every JVM option and what it is set to. NOTE: it must be in the JVM that you want to look at to use it. If you run it external you won't see what is happening in the JVM that JBoss is running on.
set "JAVA_OPTS= -XX:+UnlockDiagnosticVMOptions -XX:+PrintFlagsFinal %JAVA_OPTS%"
There is a shortcut to use when we are only interested in the modified flags.
-XX:+PrintcommandLineFlags
Diagnostics
Use jmap to determine what classes are consuming permanent generation space. Output will show
class loader
# of classes
bytes
parent loader
alive/dead
type
totals
jmap -permstat JBOSS_PID >& permstat.out
JVM Options
These settings worked for me but depending how your system is set up and what your application is doing will determine if they are right for you.
-XX:SurvivorRatio=8 – Sets survivor space ratio to 1:8, resulting in larger survivor spaces (the smaller the ratio, the larger the space). The SurvivorRatio is the size of the Eden space compared to one survivor space. Larger survivor spaces allow short lived objects a longer time period to die in the young generation.
-XX:TargetSurvivorRatio=90 – Allows 90% of the survivor spaces to be occupied instead of the default 50%, allowing better utilization of the survivor space memory.
-XX:MaxTenuringThreshold=31 – To prevent premature promotion from the young to the old generation . Allows short lived objects a longer time period to die in the young generation (and hence, avoid promotion). A consequence of this setting is that minor GC times can increase due to additional objects to copy. This value and survivor space sizes may need to be adjusted so as to balance overheads of copying between survivor spaces versus tenuring objects that are going to live for a long time. The default settings for CMS are SurvivorRatio=1024 and MaxTenuringThreshold=0 which cause all survivors of a scavenge to be promoted. This can place a lot of pressure on the single concurrent thread collecting the tenured generation. Note: when used with -XX:+UseBiasedLocking, this setting should be 15.
-XX:NewSize=768m – allow specification of the initial young generation sizes
-XX:MaxNewSize=768m – allow specification of the maximum young generation sizes
Here is a more extensive JVM options list.
Is this the expected behaviour with G1?
I don't find it surprising. The base assumption is that stuff put into permgen almost never becomes garbage. So you'd expect that permgen GC would be a "last resort"; i.e. something the JVM would only do if its was forced into a full GC. (OK, this argument is nowhere near a proof ... but its consistent with the following.)
I've seen lots of evidence that other collectors have the same behaviour; e.g.
permgen garbage collection takes multiple Full GC
What is going on with java GC? PermGen space is filling up?
I found another post on the web of someone questioning something very similar and saying that G1 should perform incremental collections on the Perm Gen, but there was no answer...
I think I found the same post. But someone's opinion that it ought to be possible is not really instructive.
Is there something I can improve/corrrect in our startup parameters?
I doubt it. My understanding is that this is inherent in the permgen GC strategy.
I suggest that you either track down and fix what is using so much permgen in the first place ... or switch to Java 8 in which there isn't a permgen heap anymore: see PermGen elimination in JDK 8
While a permgen leak is one possible explanation, there are others; e.g.
overuse of String.intern(),
application code that is doing a lot of dynamic class generation; e.g. using DynamicProxy,
a huge codebase ... though that wouldn't cause permgen churn as you seem to be observing.
I would first try to find the root cause for the PermGen getting larger before randomly trying JVM options.
You could enable classloading logging (-verbose:class, -XX:+TraceClassLoading -XX:+TraceClassUnloading, ...) and chek out the output
In your test environment, you could try monitoring (over JMX) when classes get loaded (java.lang:type=ClassLoading LoadedClassCount). This might help you find out which part of your application is responsible.
You could also try listing all the classes using the JVM tools (sorry but I still mostly use jrockit and there you would do it with jrcmd. Hope Oracle have migrated those helpful features to Hotspot...)
In summary, find out what generates so many classes and then think how to reduce that / tune the gc.
Cheers,
Dimo
I agree with the answer above in that you should really try to find what is actually filling your permgen, and I'd heavily suspect it's about some classloader leak that you want to find a root cause for.
There's this thread in the JBoss forums that goes through couple of such diagnozed cases and how they were fixed. this answer and this article discusses the issue in general as well. In that article there's a mention of possibly the easiest test you can do:
Symptom
This will happen only if you redeploy your application without
restarting the application server. The JBoss 4.0.x series suffered
from just such a classloader leak. As a result I could not redeploy
our application more than twice before the JVM would run out of
PermGen memory and crash.
Solution
To identify such a leak, un-deploy your application and then trigger a
full heap dump (make sure to trigger a GC before that). Then check if
you can find any of your application objects in the dump. If so,
follow their references to their root, and you will find the cause of
your classloader leak. In the case of JBoss 4.0 the only solution was
to restart for every redeploy.
This is what I'd try first, IF you think that redeployment might be related. This blog post is an earlier one, doing the same thing but discussing the details as well. Based on the posting it might be though that you're not actually redeploying anything, but permgen is just filling up by itself. In that case, examination of classes + anything else added to permgen might be the way (as has been already mentioned in previous answer).
If that doesn't give more insight, my next step would be trying out plumbr tool. They have a sort of guarantee on finding the leak for you, as well.
You should be starting your server.bat with java command with -verbose:gc
I've got a Java webapp running on one tomcat instance. During peak times the webapp serves around 30 pages per second and normally around 15.
My environment is:
O/S: SUSE Linux Enterprise Server 10 (x86_64)
RAM: 16GB
server: Tomcat 6.0.20
JVM: Java HotSpot(TM) 64-Bit Server VM 1.6.0_14
JVM options:
CATALINA_OPTS="-Xms512m -Xmx1024m -XX:PermSize=128m -XX:MaxPermSize=256m
-XX:+UseParallelGC
-Djava.awt.headless=true
-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps"
JAVA_OPTS="-server"
After a couple of days of uptime the Full GC starts occurring more frequently and it becomes a serious problem to the application's availability. After a tomcat restart the problem goes away but, of course, returns after 5 to 10 or 30 days (not consistent).
The Full GC log before and after a restart is at http://pastebin.com/raw.php?i=4NtkNXmi
It shows a log before the restart at 6.6 days uptime where the app was suffering because Full GC needed 2.5 seconds and was happening every ~6 secs.
Then it shows a log just after the restart where Full GC only happened every 5-10 minutes.
I've got two dumps using jmap -dump:format=b,file=dump.hprof PID when the Full GCs where occurring (I'm not sure whether I got them exactly right when a Full GC was occurring or between 2 Full GCs) and opened them in http://www.eclipse.org/mat/ but didn't get anything useful in Leak Suspects:
60MB: 1 instance of "org.hibernate.impl.SessionFactoryImpl" (I use hibernate with ehcache)
80MB: 1,024 instances of "org.apache.tomcat.util.threads.ThreadWithAttributes" (these are probably the 1024 workers of tomcat)
45MB: 37 instances of "net.sf.ehcache.store.compound.impl.MemoryOnlyStore" (these should be my ~37 cache regions in ehcache)
Note that I never get an OutOfMemoryError.
Any ideas on where should I look next?
When we had this issue we eventually tracked it down to the young generation being too small. Although we had given plenty of ram the young generation wasn't given it's fair share.
This meant that small garbage collections would happen more frequently and caused some young objects to be moved into the tenured generation meaning more large garbage collections also.
Try using the -XX:NewRatio with a fairly low value (say 2 or 3) and see if this helps.
More info can be found here.
I've switched from -Xmx1024m to -Xmx2048m and the problem went away. I now have 100 days of uptime.
Beside tuning the various options of JVM I would also suggest to upgrade to a newer release of the VM, because later versions have much better tuned garbage collector (also without trying the new experimental one).
Beside that also if it's (partially) true that assigning more ram to JVM could increase the time required to perform GC there is a tradeoff point between using the whole 16 GB of memory and increasing your memory occupation, so you can try double all values, to start
Xms1024m -Xmx2048m -XX:PermSize=256m -XX:MaxPermSize=512m
Regards
Massimo
What might be happening in your case is that you have a lot of objects who live a little longer than NewGen life cycle. If survivor space is too small, they go straight to the OldGen. -XX:+PrintTenuringDistribution could provide some insight. Your NewGen is large enough, so try decreasing SurvivorRatio.
also, jconsole will probably provide more visual insight into what happens with your memory, try it.