heapdump size vs hprof size

heapdump size vs hprof size - java

I recently made a heapdump in a hprof format when my jboss server was running with a xms of 4096m and xmx of 4096m and a permsize of 512m.
The hprof file generated is over 5gb. When I load the heapdump in visualvm, mat analyzer or yourkit, I only see a total bytes of approximately 1gb. I've tried changed the reachability scope in yourkit but it does not show more than 1 gb.
Any idea what this big difference in filesize vs displayed heapdump size can cause?
ps: I'm using jdk1.6.0_23
Unfortunately I'm not allowed to submit screenshots here.
On the filesystem the hprof size is of 5.227.659 kb and in yourkit it states:
Objects: 9.738.282 / shallow size 740 mb / retained size: 740 mb String reachable among them: 6.652.515 (68%) / shallow size: 381 mb (51%) / retained size: 381 MB (51%)
The largest retained size is a byte[] of 206.810.176

which command did you use to generate heap dump?
$JAVA_HOME/bin/jmap -dump:live,format=b,file=c:/tmp/heap_dump.bin PID
maybe you need to pass live option, according to spec
-dump:<dump-options> to dump java heap in hprof binary format
dump-options:
live dump only live objects; if not specified,
all objects in the heap are dumped.

Did you try "Unreachable Objects Histogram" (you can find the link from the top of "Overview" page)? In one of my heapdumps sized 1509MB, mat shows only 454MB, but the rest is essentially garbage, and sure enough, the sum of "Shallow Heap" in unreachable objects histogram is 966MB.

This just means that most likely your heap-dump consisted of a large amount of unreachable objects that would have been garbage collected, if a GC were to run.
Now that does not mean that you don't still have a leak, it just means that in your 5 GB Hprof, 4 GB of objects were unreachable and hence were not interesting sources of a leak.
In Java a memory leak can only occur if Garbage Collection can't clean out an object because something is holding a reference to it (unexpectedly). So your leak (if any) is to be found in the 1 GB of objects that remained in your hprof.

Related

Impact of heap parameters on GC/performance?

Most of the place on net , I get below info about heap parameters
-Xms<size> set initial Java heap size
-Xmx<size> set maximum Java heap size
Here is mine understanding/question when I mention -Xms 512M -Xmx 2048M parameters ,
-Xms :- My understanding is if my java process is actually needed only 200M , with mention of -Xms 512M , java process will still be assigned only 200M(actual memory required) instead of 500M . But if I already know that my application is going to take this 512M memory on start up then specifying less than will have impact on performance as anyways heap block need to resized which is costly operation.
Per discussion with my colleague, By default GC will trigger on 60% of Xms value. Is that correct ? If yes is it minor GC or full GC that is dependant on Xms value ?
Update on Xms:-
This seems to be true after reading JVM heap parameters but again is value 60% by default and is it minor or full GC that is dependent on Xms value?
-Xmx:- My understanding is with mention of -Xmx 2048M , java process is actually going to reserve 2048M memory for its use from OS itso that another process can not be given its share.If java process needed anyhow more than 2048M memory, then out of memory will be thrown.
Also i believe there is some relation of Full GC trigger on value of -Xmx. Because what I observed is when memory reaches near to 70% of Xmx, Full GC happens in jconsole. Is that correct?
Configuration :- I am using linux box(64 bit JVM 8). Default GC i.e Parallel GC

GC is not triggered based on just Xms or Xmx value.
Heap = New + Old generations
The heap size (which is initially set to Xms) is split into 2 generations - New (aka Young) and Old (aka Tenured). New generation is by default 1/3rd of the total heap size while Old generation is 2/3rd of the heap size. This can be adjusted by using JVM parameter called NewRatio. Its default value is 2.
Young Generation is further divided in Eden and 2 Survivor spaces. The default ratio of these 3 spaces are: 3/4th, 1/8th, 1/8th.
Side note: This is about Parallel GC collectors. For G1 - new GC algorithm divides the heap space differently.
Minor GC
All new objects are allocated in Eden space (except massive ones which are directly stored in Old generation). When Eden space becomes full Minor GC is triggered. Objects which survive multiple minor GCs are promoted to Old Generation (default is 15 cycles which can be changed using JVM parameter: MaxTenuringThreshold).
Major GC
Unlike concurrent collector, where Major GC is triggered based on used-space (by default 70%), parallel collectors calculate threshold based on 3 goals mentioned below.
Parallel Collector Goals
Max GC pause time - Maximum time spent in doing GC
Throughput - Percentage of time spent in GC vs Application. Default (1%)
Footprint - Maximum heap size (Xmx)
Thus by default, Parallel Collector tries to spend maximum 1% of total application running time in Garbage Collection.
More details here
Xms to Xmx
During startup JVM creates heap of size Xms but reserves the extra space (Xmx) to be able to grow later. That reserved space is called Virtual Space. Do note that it just reserves the space and does not commit.
2 parameters decide when heap size grows (or shrinks) between Xms and Xmx.
MinHeapFreeRatio (default: 40%): Once the free heap space dips below 40%, a Full GC is triggered, and the heap size grows by 20%. Thus, heap size can keep growing incrementally until it reaches Xmx.
MaxHeapFreeRatio (default: 70%): On the flip side, heap free space crosses 70%, then Heap size is reduced by 5% incrementally during each GC until it reaches Xms.
These parameters can be set during startup. Read more about it here and here.
PS: JVM GC is fascinating topic and I would recommend reading this excellent article to understand in-depth. All the JVM tuning parameters can be found here.

Java BufferedImage memory consumption

Our application generates images. The memory consumed by BufferedImage generates an out of memory exception:
java.lang.OutOfMemoryError: Java heap space
This happens with the following line:
BufferedImage result = new BufferedImage(2540, 2028, BufferedImage.TYPE_INT_ARGB);
When checking free memory just before this instruction it shows that I have 108MB free memory. The approach I use to check memory is:
Runtime rt = Runtime.getRuntime();
rt.gc();
long maxMemory = rt.maxMemory();
long usedMemory = rt.totalMemory() - rt.freeMemory();
long freeMem = maxMemory - usedMemory;
We don't understand how the BufferedImage can consume more than 100MB of memory. It should use 2540 * 2028 * 4 bytes, which is ~20 MB.
Why is so much memory consumed when creating the BufferedImage? What we can do to reduce this?

Asking Runtime for the amount of free memory is not really reliable in a multithreaded environment, as the memory could be used up by another thread right after you measured. Also, you are using maxMemory - usedMemory, which is not the amount of free memory, but rather what the VM thinks it can make available at most - it may be that the host system can not satisfy a request for more memory, while the VM still believes it can enlarge the heap.
It's also fully possible that your VM has 108 MB free, but no 20MB in one chunk is available. The type of BufferedImage you are trying to create is ultimately backed by an int[] array, which must be allocated as a contiguous memory block. That means if no contiguous 20MB block is available on the heap, no matter how much total free memory there is otherwise, you will get an OutOfMemoryError. The situation is further complicated by the garbage collector used - each GC has different strategies for memory allocation; a sizable portion of the heap may be set aside for thread local memory allocation.
Without any information how large the heap is in total, which GC you are using (and which VM for the matter), there are too many variables to point a finger on a culprit.
Edit: Find out which GC is used (Java 7 (JDK 7) garbage collection and documentation on G1) and have a glance on its specific pros and cons - especially what capabilities it offers in terms of heap compaction and how large its generations are by default. That would be the parameters to play with. Running the application with GC messages on may also provide insight on whats going on.
Considering your heap is only 900MB in size, 100MB free means your pretty close to the limit already - my first go to cure would be to simply assign the VM a much larger heap, lets say 2GB. If you need to conserve memory your only bet is tuning the GC parameters (possibly select another GC) - and to be honest I have no experience with that. There are plenty of articles on the topic of GC tuning available, though.

Mapper side OutOfMeory

I'm facing heap space OutOfMemory error during my Mapper side cleanup method, where i'm reading the data from inputStream and converting it into byte array using IOUtils.toByteArray(inptuStream);
I know i can resolve it by increasing the max heap space(Xmx), but i should be having enough heap space(1Gb) already. I found the below info on debugging(approximate space value),
runtime.maxMemory() - 1024Mb
runtime.totalMemory - 700Mb
runtime.freeMemory - 200Mb
My block size is 128 Mb and i'm not adding any additional data to it on my RecordReader. My output size from the mapper wont be more than 128 Mb.
And also i saw the available bytes in inputStream(.available()) which is provided an approximate value of 128 Mb.
I'm also a bit confused about the memory allocation of JVM. Let's say I set my heap space value as Xms-128m;Xmx-1024m. My tasktracker has 16Gb RAM and already I've 8jobs(8JVM) running in that tasktracker. Lets assume that the tasktracker can allocate only 8.5 Gb RAM for JVM and it'll use the rest for it's internal purpose. So we have 8.5Gb RAM available and 8 tasks are running which is currently using only 6Gb RAM. Is it possible for a new task be assigned to the same task tracker since already 8 tasks are running which might require 8Gb in which case the new task wont be able to provide user requested heap size(1Gb) if required.
PS: I know that not all heap needs to be in RAM(paging). My main question is, will the user be able to get the maximum requested heap size in all scenario?

Why does the JVM Heap Usage Max as reported by JMX change over time?

My JVM heap max is configured at 8GB on the name node for one of my hadoop clusters. When I monitor that JVM using JMX, the reported maximum is constantly fluctuating, as shown in the attached image.
http://highlycaffeinated.com/assets/images/heapmax.png
I only see this behavior on one (the most active) of my hadoop clusters. On the other clusters the reported maximum stays fixed at the configured value. Any ideas why the reported maximum would change?
Update:
The java version is "1.6.0_20"
The heap max value is set in hadoop-env.sh with the following line:
export HADOOP_NAMENODE_OPTS="-Xmx8G -Dcom.sun.management.jmxremote.port=8004 $JMX_SHARED_PROPS"
ps shows:
hadoop 27605 1 99 Jul30 ? 11-07:23:13 /usr/lib/jvm/jre/bin/java -Xmx1000m -Xmx8G
Update 2:
Added the -Xms8G switch to the startup command line last night:
export HADOOP_NAMENODE_OPTS="-Xms8G -Xmx8G -Dcom.sun.management.jmxremote.port=8004 $JMX_SHARED_PROPS"
As shown in the image below, the max value still varies, although the pattern seems to have changed.
http://highlycaffeinated.com/assets/images/heapmax2.png
Update 3:
Here's a new graph that also shows Non-Heap max, which stays constant:
http://highlycaffeinated.com/assets/images/heapmax3.png

According to the MemoryMXBean documentation, memory usage is reported in two categories, "Heap" and "Non-Heap" memory. The description of the Non-Heap category says:
The Java virtual machine manages memory other than the heap (referred as non-heap memory).
The Java virtual machine has a method area that is shared among all threads. The method area belongs to non-heap memory. It stores per-class structures such as a runtime constant pool, field and method data, and the code for methods and constructors. It is created at the Java virtual machine start-up.
The method area is logically part of the heap but a Java virtual machine implementation may choose not to either garbage collect or compact it. Similar to the heap, the method area may be of a fixed size or may be expanded and shrunk. The memory for the method area does not need to be contiguous.
This description sounds a lot like the permanent generation (PermGen), which is indeed part of the heap and counts against the memory allocated using the -Xmx flag. I'm not sure why they decided to report this separately since it is part of the heap.
I suspect that the fluctuations you're seeing are a result of the JVM shrinking and growing the permanent generation, which would cause the reported max heap space available for non-PermGen uses to change accordingly. If you could get a sum of the Heap and Non-Heap maxes as reported by JMX and this sum stays constant at the 8G limit, that would verify this hypothesis.

One possibility is that the JVM survivor space is fluctuating in max-size.
The JVM max-size reported by JMX via the HeapMemoryUsage.max attribute is not the actual max-size of the heap (i.e. the one set with -Xmx )
The reported value is the max heap size minus the max survivor space size
To get the total max heap size, add the two jmx attributes:
java.lang:type=Memory/HeapMemoryUsage.max + java.lang:type=MemoryPool,name=Survivor Space/Usage.max
(tested on oracle jdk 1.7.0_45)

Efficient GC collection with large heap of 30 - 100GB

Can Java 7 now handle large heap of 30 - 100GB efficiently without significant GC pause?

There are tuning options available, and concurrent GC, but there will still be some pauses during the GC of the tenured generation.
Angelika Langer explains this in detail in this presentation:
http://vimeo.com/28761227

Another option is to use Terracotta BigMemory. This is useful if you are storing objects in a big cache in heap. This is not open source but in my opinion, reasonably priced. BigMemory basically allocates object memory outside heap and hence the heap size can be kept to a minimum or medium size.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.