We have a four-datanodes-cluster running CDH5.0.2, installed through Cloudera Manager parcels.
In order to import 13M users' rows into HBase, we wrote a simple Python script and used hadoop-streaming jar. It works as expected up to 100k rows. And then... then, one after the other, all datanodes crash with the same message:
The health test result for REGION_SERVER_GC_DURATION has become bad:
Average time spent in garbage collection was 44.8 second(s) (74.60%)
per minute over the previous 5 minute(s).
Critical threshold: 60.00%.
Any attempt to solve the issue following the advices found around the web (e.g. [1], [2], [3]) do not lead anywhere near a solution. "Playing" with java heap size is useless. The only thing which "solved" the situation was increasing Garbage Collection Duration Monitoring Period for region servers from 5' to 50'. Arguably a dirty workaround.
We don't have the workforce to create a monitor for our GC usage right now. We eventually will, but I was wondering how possibly importing 13M rows into HBase could lead to a sure crash of all region servers. Is there a clean solution?
Edit:
JVM Options on Datanodes are:
-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:-CMSConcurrentMTEnabled -XX:CMSInitiatingOccupancyFraction=70 -XX:+CMSParallelRemarkEnabled
Datanodes are physical machines running CentOS 6.5, each with 32Gb Ram and 1Quadcore at 2GHz with 30Mb cache.
Below excerpt of the Python script which we run. We fill two tables: one with a unique user ID as rowkey and a single columnfamily with users' info, another with all info we might want to access as rowkey.
#!/usr/bin/env python2.7
import sys
import happybase
import json
connection = happybase.Connection(host=master_ip)
hbase_main_table = connection.table('users_table')
hbase_index_table = connection.table('users_index_table')
header = ['ID', 'COL1', 'COL2', 'COL3', 'COL4']
for line in sys.stdin:
l = line.replace('"','').strip("\n").split("\t")
if l[header.index("ID")] == "ID":
#you are reading the header
continue
for h in header[1:]:
try:
id = str(l[header.index("ID")])
col = 'info:' + h.lower()
val = l[header.index(h)].strip()
hbase_table.put(id_au_bytes, {
col: val
})
indexed = ['COL3', 'COL4']
for typ in indexed:
idx = l[header.index(typ)].strip()
if len(idx) == 0:
continue
row = hbase_index_table.row(idx)
old_ids = row.get('d:s')
if old_ids is not None:
ids = json.dumps(list(set(json.loads(old_ids)).union([id_au])))
else:
ids = json.dumps([id_au])
hbase_index.put(idx, {
'd:s': ids,
'd:t': typ,
'd:b': 'ame'
})
except:
msg = 'ERROR '+str(l[header.index("ID")])
logging.info(msg, exc_info=True)
One of the major issues that a lot of people are running into these days is that the amount of RAM available to java applications has exploded but most of the information about tuning Java GC is based on experience from the 32-bit era.
I recently spent a good deal of time researching GC for large heap situations in order to avoid the dreaded "long pause". I watched this excellent presentation several time and finally GC and the issues I've faced with it started making more sense.
I don't know that much about Hadoop but I think you may be running into a situation where your young generation is too small. It's unfortunate but most information about JVM GC tuning fails to emphasize that the best place for your objects to be GC'd is in the young generation. It takes literally no time at all to collect garbage at this point. I won't go into the details (watch the presentation if you want to know) but what happens is that is if you don't have enough room in your young (new) generation, it fills up prematurely. This forces a collection and some objects will be moved to the tenured (old) generation. Eventually the tenured generation fills up and it will need to be collected too. If you have a lot of garbage in your tenured generation, this can be very slow as the tenured collection algorithm is generally mark sweep which is has a non-zero time for collecting garbage.
I think you are using Hotspot. Here's a good reference for the various GC arguments for hotspot. JVM GC options
I would start by increasing the size of the young generation greatly. My assumption here is that a lot of short to medium lived objects are being created. What you want to avoid is having these be promoted into the tenured generation. The way you do that is extend the time that they spend in the young generation. To accomplish that, you can either increase it's size (so it takes longer to fill up) or increase the tenuring threshold (essentially the number young collections the object will stay for). The problem with the tenuring threshold is that it takes time to move the object around in the young generation. Increasing the size of the young generation is inefficient in terms of memory but my guess is that you have lots to spare.
I've used this solution with caching servers and I have minor collections in the > 100 ms range and infrequent (less than one a day) major collections generally under 0.5s with a heap around 4GB. Our object live either 5 min, 15 min or 29 days.
Another thing you might want to consider is the G1 (garbage first) collector which was recently added (relatively speaking) to HotSpot.
I'm interested in how well this advice works for you. Good luck.
Related
We inherited a system which runs in production and started to fail every 10 hours recently. Basically, our internal software marks the system that is has failed if it is unresponsive for a minute. We found that our problem that our Full GC cycles last for 1.5 minutes, we use 30 GB heap. Now the problem is that we cannot optimize a lot in a short period of time and we cannot partition of our service quickly but we need to get rid of 1.5 minutes pauses as soon as possible as our system fails because of these pauses in production. For us, an acceptable delay is 20 milliseconds but not more. What will be the quickest way to tweak the system? Reduce the heap to trigger GCs frequently? Use System.gc() hints? Any other solutions? We use Java 8 default settings and we have more and more users - i.e. more and more objects created.
Some GC stat
You have a lot of retained data. There is a few options which are worth considering.
increase the heap to 32 GB, this has little impact if you have free memory. Looking again at your totals it appears you are using 32 GB rather than 30 GB, so this might not help.
if you don't have plenty of free memory, it is possible a small portion of your heap is being swapped as this can increase full GC times dramatically.
there might be some simple ways to make the data structures more compact. e.g. use compact strings, use primitives instead of wrappers e.g. long for a timestamp instead of Date or LocalDateTime. (long is about 1/8th the size)
if neither of these help, try moving some of the data off heap. e.g. Chronicle Map is a ConcurrentMap which uses off heap memory can can reduce you GC times dramatically. i.e. there is no GC overhead for data stored off heap. How easy this is to add highly depends on how your data is structured.
I suggest analysing how your data is structured to see if there is any easy ways to make it more efficient.
There is no one-size-fits-all magic bullet solution to your problem: you'll need to have a good handle on your application's allocation and liveness patterns, and you'll need to know how that interacts with the specific garbage collection algorithm you are running (function of version of Java and command line flags passed to java).
Broadly speaking, a Full GC (that succeeds in reclaiming lots of space) means that lots of objects are surviving the minor collections (but aren't being leaked). Start by looking at the size of your Eden and Survivor spaces: if the Eden is too small, minor collections will run very frequently, and perhaps you aren't giving an object a chance to die before its tenuring threshold is reached. If the Survivors are too small, objects are going to be promoted into the Old gen prematurely.
GC tuning is a bit of an art: you run your app, study the results, tweak some parameters, and run it again. As such, you will need a benchmark version of your application, one which behaves as close as possible to the production one but which hopefully doesn't need 10 hours to cause a full GC.
As you stated that you are running Java 8 with the default settings, I believe that means that your Old collections are running with a Serial collector. You might see some very quick improvements by switching to a Parallel collector for the Old generation (-XX:+UseParallelOldGC). While this might reduce the 1.5 minute pause to some number of seconds (depending on the number of cores on your box, and the number of threads you specify for GC), this will not reduce your max pause to to 20ms.
When this happened to me, it was due to a memory leak caused by a static variable eating up memory. I would go through all recent code changes and look for any possible memory leaks.
At my company we are trying an approach with JVM based microservices. They are designed to be scaled horizontally and so we run multiple instances of each using rather small containers (up to 2G heap, usually 1-1.5G). The JVM we use is 1.8.0_40-b25.
Each of such instances typically handles up to 100 RPS with max memory allocation rate around 250 MB/s.
The question is: what kind of GC would be a safe sensible default to start off with? So far we are using CMS with Xms = Xmx (to avoid pauses during heap resizing) and Xms = Xmx = 1.5G. Results are decent - we hardly ever see any Major GC performed.
I know that G1 could give me smaller pauses (at the cost of total throughput) but AFAIK it requires a bit more "breathing" space and at least 3-4G heap to perform properly.
Any hints (besides going for Azul's Zing :D) ?
Hint # 1: Do experiments!
Assuming that your microservice is deployed at least on two nodes run one on CMS, another on G1 and see what response times are.
Not very likely, but what if you can find that with G1 performance is so good that need half of original cluster size?
Side notes:
re: "250Mb/s" -> if all of this is stack memory (alternatively, if it's young gen) then G1 would provide little benefit since collection form these areas is free.
re: "100 RPS" -> in many cases on our production we found that reducing concurrent requests in system (either via proxy config, or at application container level) improves throughput. Given small heap it's very likely that you have small cpu number as well (2 to 4).
Additionally there are official Oracle Hints on tuning for a small memory footprint. It might not reflect latest config available on 1.8_40, but it's good read anyway.
Measure how much memory is retained after a full GC. add to this the amount of memory allocated per second and multiply by 2 - 10 depending on how often you would like to have a minor GC. e.g. every 2 second or every 10 second.
E.g. say you have up to 500 MB retained after a full GC and GCing every couple of seconds is fine, you can have 500 MB + 2 * 250 MB, or a heap of around 1 GB.
The number of RPS is not important.
I am writing my servlet program and use jconsole and jmap to monitor its memory status.I find that when my program is running , Memory Pool "PS Old Gen" is becoming larger and larger and finally my servlet cannot respond to any request.
This is the shotcut of my JConsole output:
When I click the "Perform GC" Button, Nothing happened.
So , to see the details ,I use jmap to dump the details:
And this is my JConsole VM Summary output:
Any one can help me to get out what may be the problem?Your know , the GC "PS MarkSweep" and "PS SCavenge" is the default GC for my Server JVM.
Thank you.
I find a very weird phenomenon:During 15 hours from 18:00 of yesterday to 09::00 of today ,it seems that GC on "PS Old Gen" never occurred , which make the used memory of old generation is becoming larger and larger.i have just manually click the "Perform GC" button ,it seems that this GC is quite effective and reclaim a lot of memory. But why the Old generation GC didn't happen automatically for such a long time? We can see that before 18:00 of yesterday , the old generation GC is working properly.
Assuming that you did not add the option -histo:live when you were taking the jmap, which will result in taking the report with garbage + live objects, and referring to the memory drop happened when you manually click the "Perform GC" button, I doubt that the application don't have a memory leak but a bad object promotion rate from the Young Gen to Old Gen. Eventually the Old Gen will be filled up and will run full GC resulting the application to go unresponsive.
If my assumption is correct, I think your strategy should be to minimize promoting the objects to old gen. Rather than worrying about what to do to clear out the Old Gen which is more expensive. Referring the below comments you have mentioned, I think you application has a small Memory Footprint (< 0.5G ) relative to the max allocated memory 7G .
"All my data-intensive variables are defined in method. When method returned , these variables should be reclaimed, right?"
So there are few things you can do.
Tune the application to minimize the response times of your transactions so that the objects will be garbage collected before been promoted to the Old Gen
Increase the Young Gen size. Since you have around 7 GB to play with, why don't you allocate around 2 - 3 GB to Young Gen for a start ( i.e -XX:NewSize=2g ). A larger New Size will reduce the frequency of PCSacavenge ( Young Collections) and will reduce the rate of aging the live objects.
Then start adjusting the -XX:MaxTenuringThreshold=n . You can us the gc.log with -XX:+PrintTenuringDistribution. Size the survivor ratios -XX:SurvivorRatio=n. Note that by default -XX:+UseAdaptiveSizePolicy is on and this will alter the initial size of Survivor Ratios dynamically. Or else you can skip sizing the Survivor Ratios leaving the AdaptiveSizePolicy to do the job. But I'm not a big fan of AdaptiveSizePolicy.
Along with AdaptiveSizePolicy you can use -XX:MaxGCPauseMillis=n in order to give an indication to the Garbage Collector regarding the pauses you are expecting in your application when clearing the Old Gen. In this way the Collector will try to achieve the MaxGCPauseMillis by not waiting until there is too much work to do.
Or else you can switch to CMS collector which is built to handle response time issues like these.
Well I think if first two steps resolves your problem then you can leave the rest aside. You must not spoil a well running app by adding some additional stuff. Important thing is you have to tune the GC step by step.
Your memory leak happes within MongoDB code. The huge number of map entries you see are most probably the internals of BasicDBObject (ranked #6 in your dump), which extends HashMap. You may be able to resolve the issue by reconfiguring the MongoDB component.
I am unsure whether there is a generic answer for this, but I was wondering what the normal Java GC pattern and java heap space usage looks like. I am testing my Java 1.6 application using JMeter. I am collecting JMX GC logs and plotting them with JMeter JMX GC and Memory plugin extension. The GC pattern looks quite stable with most GC operations being 30-40ms, occasional 90ms. The memory consumption goes in a saw-tooth pattern. The JHS usage grows constantly upwards e.g. to 3GB and every 40 minutes the memory usage does a free-fall drop down to around 1GB. The max-min delta however grows, so the sawtooth height constantly grows. Does it do a full GC every 40mins?
Most of your descriptions in general, are how the GC works. However, none of your specific observations, especially numbers, hold for general case.
To start with, each JVM has one or several GC implementations and you could choose which one to use. Take the mostly applied one i.e. SUN JVM (I like to call it this way) and the common server GC pattern as example.
Firstly, the memory are divided into 4 regions.
A young generation which holds all of the recently created objects. When this generation is full, GC does a stop-the-world collection by stopping your program from working, execute a black-gray-white algorithm and get the obselete objects and remove them. So this is your 30-40 ms.
If an object survived a certain rounds of GC in the young gen, it would be moved into a swap generation. The swap generation holds the objects until another number of GCs - then move them to the old generation. There are 2 swap generations which does a double buffering kind of thing to facilitate the young gen to work faster. If young gen dumps stuff to swap gen and found swap gen is mostly full, a GC would happen on swap gen and potentially move the survived objects to old gen. This most likely makes your 90ms, though I am not 100% sure how swap gen works. Someone correct me if I am wrong.
All the objects survived swap gen would be moved to the old generation. The old generation would only be GC-ed until it's mostly full. In your case, every 40 min.
There is another "permanent gen" which is used to load your jar target byte code and resources.
All size of the areas can be adjusted by JVM parameters.
You can try to use VisualVM which would give you a dynamic idea of how it works.
P.S. not all JVM / GC works the same way. If you use G1 collector, or JRocket, it might happens slightly different, but the general idea holds.
Java GC work in terms of generations of objects. There are young, tenure and permament generations. It seems like in your case: every 30-40ms GC process only young generation (and transfers survived objects into tenure generation). And every 40 mins it performs full collecting (it causes stop-the-world pause). Note: it happens not by time, but by percentage of used memory.
There are several JVM options, which allows you to chose generation's sizes, type of GC (there are several algorithms for GC, in java 1.6 Serial GC is used by default, for example -XX:-UseConcMarkSweepGC), parameters of GC work.
You'd better try to find good articles about generations and different types of GC (algorithms are really different, some of them allow to avoid stop-the-world pauses at all!)
yes, most likely. Instead of guessing you can use jstat to monitor your GCs.
I suggest you use a memory profiler to ensure there is nothing simple you can do ti improve the amount of garbage you are producing.
BTW, If you increase the size of the young generation, you can reduce how much garbage makes it into the tenured space reducing the frequency of full collections. You may find you less than one full collection per day if you tune it enough.
For a more extreme case, I have tuned a trading system to less than one collection per day (minor or major)
In Java, the concurrent mode failure means that the concurrent collector failed to free up enough memory space form tenured and permanent gen and has to give up and let the full stop-the-world gc kicks in. The end result could be very expensive.
I understand this concept but never had a good comprehensive understanding of
A) what could cause a concurrent mode failure and
B) what's the solution?.
This sort of unclearness leads me to write/debug code without much of hints in mind and often has to shop around those performance flags from Foo to Bar without particular reasons, just have to try.
I'd like to learn from developers here how your experience is? If you had encountered such performance issue, what was the cause and how you addressed it?
If you have coding recommendations, please don't be too general. Thanks!
The first thing about CMS that I have learned is it needs more memory than the other collectors, about 25 to 50% more is a good starting point. This helps you avoid fragmentation, since CMS does not do any compaction like the stop the world collectors would. Second, do things that help the garbage collector; Integer.valueOf instead of new Integer, get rid of anonymous classes, make sure inner classes are not accessing inaccessible things (private in the outer class) stuff like that. The less garbage the better. FindBugs and not ignoring warnings will help a lot with this.
As far as tuning, I have found that you need to try several things:
-XX:+UseConcMarkSweepGC
Tells JVM to use CMS in tenured gen.
Fix the size of your heap: -Xmx2048m -Xms2048m This prevents GC from having to do things like grow and shrink the heap.
-XX:+UseParNewGC
use parallel instead of serial collection in the young generation. This will speed up your minor collections, especially if you have a very large young gen configured. A large young generation is generally good, but don't go more than half of the old gen size.
-XX:ParallelCMSThreads=X
set the number of threads that CMS will use when it is doing things that can be done in parallel.
-XX:+CMSParallelRemarkEnabled remark is serial by default, this can speed you up.
-XX:+CMSIncrementalMode allows application to run more by pasuing GC between phases
-XX:+CMSIncrementalPacing allows JVM to figure change how often it collects over time
-XX:CMSIncrementalDutyCycleMin=X Minimm amount of time spent doing GC
-XX:CMSIncrementalDutyCycle=X Start by doing GC this % of the time
-XX:CMSIncrementalSafetyFactor=X
I have found that you can get generally low pause times if you set it up so that it is basically always collecting. Since most of the work is done in parallel, you end up with basically regular predictable pauses.
-XX:CMSFullGCsBeforeCompaction=1
This one is very important. It tells the CMS collector to always complete the collection before it starts a new one. Without this, you can run into the situation where it throws a bunch of work away and starts again.
-XX:+CMSClassUnloadingEnabled
By default, CMS will let your PermGen grow till it kills your app a few weeks from now. This stops that. Your PermGen would only be growing though if you make use of Reflection, or are misusing String.intern, or doing something bad with a class loader, or a few other things.
Survivor ratio and tenuring theshold can also be played with, depending on if you have long or short lived objects, and how much object copying between survivor spaces you can live with. If you know all your objects are going to stick around, you can configure zero sized survivor spaces, and anything that survives one young gen collection will be immediately tenured.
Quoted from "Understanding Concurrent Mark Sweep Garbage Collector Logs"
The concurrent mode failure can either
be avoided by increasing the tenured
generation size or initiating the CMS
collection at a lesser heap occupancy
by setting
CMSInitiatingOccupancyFraction to a
lower value
However, if there is really a memory leak in your application, you're just buying time.
If you need fast restart and recovery and prefer a 'die fast' approach I would suggest not using CMS at all. I would stick with '-XX:+UseParallelGC'.
From "Garbage Collector Ergonomics"
The parallel garbage collector
(UseParallelGC) throws an
out-of-memory exception if an
excessive amount of time is being
spent collecting a small amount of the
heap. To avoid this exception, you can
increase the size of the heap. You can
also set the parameters
-XX:GCTimeLimit=time-limit and -XX:GCHeapFreeLimit=space-limit
Sometimes OOM pretty quick and got killed, sometime suffers long gc period (last time was over 10 hours).
It sounds to me like a memory leak is at the root of your problems.
A CMS failure won't (as I understand it) cause an OOM. Rather a CMS failure happens because the JVM needs to do too many collections too quickly, and CMS could not keep up. One situation where lots of collection cycles happen in a short period is when your heap is nearly full.
The really long GC time sounds weird ... but is theoretically possible if your machine was thrashing horribly. However, a long period of repeated GCs is quite plausible if your heap is very nearly full.
You can configure the GC to give up when the heap is 1) at max size and 2) still close to full after a full GC has completed. Try doing this if you haven't done so already. It won't cure your problems, but at least your JVM will get the OOM quickly, allowing a faster service restart and recovery.
EDIT - the option to do this is -XX:GCHeapFreeLimit=nnn where nnn is a number between 0 and 100 giving the minimum percentage of the heap that must be free after the GC. The default is 2. The option is listed in the aptly titled "The most complete list of -XX options for Java 6 JVM" page. (There are lots of -XX options listed there that don't appear in the Sun documentation. Unfortunately the page provides few details on what the options actually do.)
You should probably start looking to see if your application / webapp has memory leaks. If it has, your problems won't go away unless those leaks are found and fixed. In the long term, fiddling with the Hotspot GC options won't fix memory leaks.
I've found using -XX:PretenureSizeThreshold=1m to make 'large' object go immediately to tenured space greatly reduced my young GC and concurrent mode failures since it tends not to try to dump the young + 1 survivor amount of data (xmn=1536m survivorratio=3 maxTenuringThreashould=5) before a full CMS cycle can complete. Yes my survivor space is large, but about once ever 2 days something comes in the app that will need it (and we run 12 app servers each day for 1 app).