memory allocation in collection of 1 million references in java

memory allocation in collection of 1 million references in java - java

I created an ArrayList of 1 million MyItem objects and the memory consumed was 106mb(checked from Task Manager) But after adding the same list to two more list through addAll() method ,it takes 259mb. My question is I have added only the references to list , no new objects are created after that 1 million.why does the memory consumption increase eventhough LinkedList has been used (as it wont require contiguous memory blocks so no reallocation will be made)?
How to achieve this efficiently? Data passes through various lists in my program and consuming more than 1GB of memory.Similar scenario is presented above.
public class MyItem{
private String s;
private int id;
private String search;
public MyItem(String s, int id) {
this.s = s;
this.id = id;
}
public String getS() {
return s;
}
public int getId() {
return id;
}
public String getSearchParameter() {
return search;
}
public void setSearchParameter(String s) {
search = s;
}
}
public class Main{
public static void main(String args[]) {
List<MyItem> l = new ArrayList<>();
List<MyItem> list = new LinkedList<>();
List<MyItem> list1 = new LinkedList<>();
for (int i = 0; i < 1000000 ; i++) {
MyItem m = new MyItem("hello "+i ,i+1);
m.setSearchParameter(m.getS());
l.add(i,m);
}
list.addAll(l);
list1.addAll(l);
list1.addAll(list);
Scanner s = new Scanner(System.in);
s.next();//just not to terminate
}
}

LinkedList is a doubly-linked list, so elements in the list are represented by nodes, and each node contains 3 references.
From Java 8:
private static class Node<E> {
E item;
Node<E> next;
Node<E> prev;
Node(Node<E> prev, E element, Node<E> next) {
this.item = element;
this.next = next;
this.prev = prev;
}
}
Since you use a lot of memory, you may not be using compressed OOP, so references might be 64-bit, i.e. 8 bytes each.
With an object header of 16 bytes + 8 bytes per reference, a node occupies 40 bytes. With 1 million elements, that would be 40 Mb.
Two lists is 80 Mb, and then remember that Java memory is segmented into pools and objects (the nodes) gets moved around, and your memory consumption of an additional 153 Mb now seems about right.
Note: An Arraylist would only use 8 bytes per element, not 40 bytes, and if you preallocate the backing array, which you can do since you know the size, you would save a lot of memory that way.

Anytime you call LinkedList.addAll behind the scene it will create a LinkedList.Node for each added element so here you created 3 millions of such nodes which is not free, indeed:
This object has 3 references, knowing that the size of a reference is 4 bytes on 32-bit JVM and 64-bit JVM with UseCompressedOops (-XX:+UseCompressedOops) enabled which is the case by default with heaps less than 32 GB in Java 7 and higher and 8 bytes on 64-bit JVM with UseCompressedOops disabled (-XX:-UseCompressedOops). So here according to your configuration it gives 12 bytes or 24 bytes.
Then we add the size of the header fields which is 8 bytes on 32-bit JVM and 16 bytes on 64-bit JVM. So here according to your configuration it gives 8 bytes or 16 bytes.
So if we summarize it takes:
20 bytes per instance on 32-bit JVM
28 bytes per instance on 64-bit JVM with UseCompressedOops enabled
40 bytes per instance on 64-bit JVM with UseCompressedOops disabled
As you call 3 times addAll of 1 Million objects on a LinkedList, it gives
60 Mo on 32-bit JVM
84 Mo on 64-bit JVM with UseCompressedOops enabled
120 Mo on 64-bit JVM with UseCompressedOops disabled
The rest is probably the objects not yet collected by the garbage collector, you should try to call System.gc() after loading your ArrayList to get the real size and do the same thing after loading your LinkedList.
If you want to get the size of a given object, you can use SizeOf.
If you use a 64-bit JVM and you want to know if UseCompressedOops is enabled, simply launch your java command in a terminal with only -X options and adds -XX:+PrintFlagsFinal | grep UseCompressedOops so for example if my command is java -Xms4g -Xmx4g -XX:MaxPermSize=4g -cp <something> <my-class>, launch java -Xms4g -Xmx4g -XX:MaxPermSize=4g -XX:+PrintFlagsFinal | grep UseCompressedOops, the beginning of the output should look like this:
bool UseCompressedOops := true {lp64_product}
...
In this case the flag UseCompressedOops is enabled

Related

What JVM runtime data area are included in ThreadMXBean.getThreadAllocatedBytes(long id)

I was trying to get the memory consumption of some code snippets. After some search, I realized that ThreadMXBean.getThreadAllocatedBytes(long id) can be used to achieve this. So I tested this method with the following code:
ThreadMXBean threadMXBean = (ThreadMXBean) ManagementFactory.getThreadMXBean();
long id = Thread.currentThread().getId();
// new Long(0);
long beforeMemUsage = threadMXBean.getThreadAllocatedBytes(id);
long afterMemUsage = 0;
{
// put the code you want to measure here
for (int i = 0; i < 10; i++) {
new Long(i);
}
}
afterMemUsage = threadMXBean.getThreadAllocatedBytes(id);
System.out.println(afterMemUsage - beforeMemUsage);
I run this code with different iteration times in for loop (0, 1, 10, 20, and 30). And the result as follows:
0 Long: 48 bytes
1 Long: 456 bytes
10 Long: 672 bytes
20 Long: 912 bytes
30 Long: 1152 bytes
The differences between 1 and 10, 10 and 20, as well as 20 and 30 are easy to explain, because the size of Long object is 24 bytes. But I was confused by the huge difference between 0 and 1.
Actually, I guessed this is caused by the class loading. So I uncommented the 3rd line code and the result as follows:
0 Long: 48 bytes
1 Long: 72 bytes
10 Long: 288 bytes
20 Long: 528 bytes
30 Long: 768 bytes
It seems that my guess is confirmed by the result. However, in my opinion, the information of class structure is stored in Method Area, which is not a part of heap memory. As the Javadoc of ThreadMXBean.getThreadAllocatedBytes(long id) indicates, it returns the total amount of memory allocated in heap memory. Have I missed something?
The tested JVM version is:
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
Thanks!

The first invocation of new Long(0) causes the resolution of the constant pool entry referenced by new bytecode. While resolving CONSTANT_Class_info for the first time, JVM loads the referenced class - java.lang.Long.
ClassLoader.loadClass is implemented in Java, and it can certainly allocate Java objects. For instance, getClassLoadingLock method creates a new lock object and a new entry in parallelLockMap:
protected Object getClassLoadingLock(String className) {
Object lock = this;
if (parallelLockMap != null) {
Object newLock = new Object();
lock = parallelLockMap.putIfAbsent(className, newLock);
if (lock == null) {
lock = newLock;
}
}
return lock;
}
Also, when doing a class name lookup in the system dictionary, JVM creates a new String object.
I used async-profiler to record all heap allocations JVM does when loading java.lang.Long class. Here is the clickable interactive Flame Graph:
The graph includes 13 samples - one per each allocated object. The type of an allocated object is not shown, but it can be easily guessed from the context (stack trace).
Green color denotes Java stack trace;
Yellow means VM stack trace.
Note that each java_lang_String::basic_create() (and similar) allocates two objects: an instance of java.lang.String and its backing char[] array.
The graph is produced by the following test program:
import one.profiler.AsyncProfiler;
public class ProfileHeapAlloc {
public static void main(String[] args) throws Exception {
AsyncProfiler profiler = AsyncProfiler.getInstance();
// Dry run to skip allocations caused by AsyncProfiler initialization
profiler.start("_ZN13SharedRuntime19dtrace_object_allocEP7oopDesci", 0);
profiler.stop();
// Real profiling session
profiler.start("_ZN13SharedRuntime19dtrace_object_allocEP7oopDesci", 0);
new Long(0);
profiler.stop();
profiler.execute("file=alloc.svg");
}
}
How to run:
java -Djava.library.path=/path/to/async-profiler -XX:+DTraceAllocProbes ProfileHeapAlloc
Here _ZN13SharedRuntime19dtrace_object_allocEP7oopDesci is the mangled name for SharedRuntime::dtrace_object_alloc() function, which is called by JVM for every heap allocation whenever DTraceAllocProbes flag is on.

why my single threaded java program is consuming more than max heap size?

I have created a simple java program :-
import java.util.ArrayList;
import java.util.List;
import java.lang.management.ManagementFactory;
public class OOMError {
public static List<Person> list = new ArrayList<>();
public static void main(String args[]){
System.out.println("Process Id: "+ManagementFactory.getRuntimeMXBean().getName());
try{
while(true){
List<Person> innerList = new ArrayList<>();
for(int i=0;i<1000000;i++) {
list.add(new Person());
}
int count = 0;
while(count < 60){
for(int i=0;i<100000;i++){
innerList.add(new Person());
}
Thread.sleep(1000*60);
count=count+5;
}
Thread.sleep(1000*60*10);
}
} catch (InterruptedException e) {
e.printStackTrace();
}
}
public static class Person{
private String fName ;
private String lName;
Person(){
fName = new String("Stack");
lName = new String("Overflow");
}
}
}
I ran this program with export JAVA_OPTS='-Xms128m -Xmx1024m'
and i am monitoring my java application RAM usage by top command.
Processes: 94 total, 10 running, 2 stuck, 82 sleeping, 2765 threads 06:59:51
Load Avg: 2.62, 2.74, 2.68 CPU usage: 0.11% user, 0.54% sys, 99.33% idle SharedLibs: 13M resident, 25M data, 0B linkedit.
MemRegions: 28079 total, 19G resident, 24M private, 3449M shared. PhysMem: 2776M wired, 19G active, 3718M inactive, 25G used, 39G free.
VM: 605G vsize, 1199M framework vsize, 118994555(0) pageins, 0(0) pageouts. Networks: packets: 80924129/10G in, 112346940/77G out.
Disks: 4011149/367G read, 9149138/634G written.
PID COMMAND %CPU TIME #TH #WQ #POR #MRE RPRVT RSHR RSIZE VPRVT VSIZ PGRP PPID STATE UID FAULTS COW MSGS MSGR SYSBSD SYSMACH CSW PAGE
89038 java 0.0 02:01.65 31 2 112 437 1237M 10M 1251M 2101M 19G 89038 88799 stuck 501 321189 448 651 300 250145+ 358994 304415+ 0
I am surprised how can i my RSIZE field is 1251M when my max heap size is 1024M and the state is coming as "stuck" ?
What will happen once my heap is full ? Will the application terminate ?
I am using OS X 10.X.X

In JRockit Blog there is an article that explains this:
The command line argument -Xmx sets the maximum Java heap size (mx).
All java objects are created on the Java heap, but the total memory
consumption of the JVM process consist of more things than just the
Java heap. A few examples:
Generated (JIT:ed) code
Loaded libraries (including jar and class files)
Control structures for the java heap
Thread Stacks
User native memory (malloc:ed in JNI)
Source: Why is my JVM process larger than max heap size?

Java uses more memory than anticipated

Ok, so I try to do this little experiment in java. I want to fill up a queue with integers and see how long it takes. Here goes:
import java.io.*;
import java.util.*;
class javaQueueTest {
public static void main(String args[]){
System.out.println("Hello World!");
long startTime = System.currentTimeMillis();
int i;
int N = 50000000;
ArrayDeque<Integer> Q = new ArrayDeque<Integer>(N);
for (i = 0;i < N; i = i+1){
Q.add(i);
}
long endTime = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println(totalTime);
}
}
OK, so I run this and get a
Hello World!
12396
About 12 secs, not bad for 50 million integers. But if I try to run it for 70 million integers I get:
Hello World!
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.Integer.valueOf(Integer.java:642)
at javaQueueTest.main(javaQueueTest.java:14)
I also notice that it takes about 10 mins to come up with this message. Hmm so what if I give almost all my memory (8gigs) for the heap? So I run it for heap size of 7gigs but I still get the same error:
javac javaQueueTest.java
java -cp . javaQueueTest -Xmx7g
Hello World!
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.lang.Integer.valueOf(Integer.java:642)
at javaQueueTest.main(javaQueueTest.java:14)
I want to ask two things. First, why does it take so long to come up with the error? Second, Why is all this memory not enough? If I run the same experiment for 300 million integers in C (with the glib g_queue) it will run (and in 10 secs no less! although it will slow down the computer alot) so the number of integers must not be at fault here. For the record, here is the C code:
#include<stdlib.h>
#include<stdio.h>
#include<math.h>
#include<glib.h>
#include<time.h>
int main(){
clock_t begin,end;
double time_spent;
GQueue *Q;
begin = clock();
Q = g_queue_new();
g_queue_init(Q);
int N = 300000000;
int i;
for (i = 0; i < N; i = i+1){
g_queue_push_tail(Q,GINT_TO_POINTER(i));
}
end = clock();
time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
printf("elapsed time: %f \n",time_spent);
}
I compile and get the result:
gcc cQueueTest.c `pkg-config --cflags --libs glib-2.0 gsl ` -o cQueueTest
~/Desktop/Software Development/Tests $ ./cQueueTest
elapsed time: 13.340000

My rough thoughts about your questions:
First, why does it take so long to come up with the error?
As gimpycpu in his comment stated, java does not start with full memory acquisition of your RAM. If you want so (and you have a 64 bit VM for greater amount of RAM), you can add the options -Xmx8g and -Xms8g at VM startup time to ensure that the VM gots 8 gigabyte of RAM and the -Xms means that it will also prepare the RAM for usage instead of just saying that it can use it. This will reduce the runtime significantly. Also as already mentioned, Java integer boxing is quite overhead.
Why is all this memory not enough?
Java introduces for every object a little bit of memory overhead, because the JVM uses Integer references in the ArrayDeque datastructur in comparision to just 4 byte plain integers due to boxing. So you have to calulate about 20 byte for every integer.
You can try to use an int[] instead of the ArrayDeque:
import java.io.*;
import java.util.*;
class javaQueueTest {
public static void main(args){
System.out.println("Hello World!");
long startTime = System.currentTimeMillis();
int i;
int N = 50000000;
int[] a = new int[N];
for (i = 0;i < N; i = i+1){
a[i] = 0;
}
long endTime = System.currentTimeMillis();
long totalTime = endTime - startTime;
System.out.println(totalTime);
}
}
This will be ultra fast and due the usage of plain arrays.
On my system I am under one second for every run!

In your case, the GC struggles as it assumes that at least some objects will be short lived. In your case all objects are long lived, this adds a significant overhead to managing this data.
If you use -Xmx7g -Xms7g -verbose:gc and N = 150000000 you get an output like
Hello World!
[GC (Allocation Failure) 1835008K->1615280K(7034368K), 3.8370127 secs]
5327
int is a primitive in Java (4 -bytes), while Integer is the wrapper. This wrapper need a reference to it and a header and padding and the result is that an Integer and its reference uses 20 bytes per value.
The solution is to not queue up some many values at once. You can use a Supplier to provide new values on demand, avoiding the need to create the queue in the first place.
Even so, with 7 GB heap you should be able to create a ArrayQueue of 200 M or more.

First, why does it take so long to come up with the error?
This looks like a classic example of a GC "death spiral". Basically what happens is that the JVM does full GCs repeatedly, reclaiming less and less space each time. Towards the end, the JVM spends more time running the GC than doing "useful" work. Finally it gives up.
If you are experiencing this, the solution is to configure a GC Overhead Limit as described here:
GC overhead limit exceeded
(Java 8 configures a GC overhead limit by default. But you are apparently using an older version of Java ... judging from the exception message.)
Second, Why is all this memory not enough?
See #Peter Lawrey's explanation.
The workaround is to find or implement a queue class that doesn't use generics. Unfortunately, that class will not be compatible with the standard Deque API.

You can catch OutOfMemoryError with :
try{
ArrayDeque<Integer> Q = new ArrayDeque<Integer>(N);
for (i = 0;i < N; i = i+1){
Q.add(i);
}
}
catch(OutOfMemoryError e){
Q=null;
System.gc();
System.err.println("OutOfMemoryError: "+i);
}
in order to show when the OutOfMemoryError is thrown.
And launch your code with :
java -Xmx4G javaQueueTest
in order to increase heap size for JVM
As mentionned earlier, Java is much slower with Objects than C with primitive types ...

Performance of 2D array allocation

I am wondering why allocation of a 2D int array at once (new int[50][2]) performs poorer than allocating separately, that is, execute new int[50][] first, then new int[2] one-by-one. Here is a non-professional benchmark code:
public class AllocationSpeed {
private static final int ITERATION_COUNT = 1000000;
public static void main(String[] args) {
new AllocationSpeed().run();
}
private void run() {
measureSeparateAllocation();
measureAllocationAtOnce();
}
private void measureAllocationAtOnce() {
Stopwatch stopwatch = Stopwatch.createStarted();
for (int i = 0; i < ITERATION_COUNT; i++) {
allocateAtOnce();
}
stopwatch.stop();
System.out.println("Allocate at once: " + stopwatch);
}
private int allocateAtOnce() {
int[][] array = new int[50][2];
return array[10][1];
}
private void measureSeparateAllocation() {
Stopwatch stopwatch = Stopwatch.createStarted();
for (int i = 0; i < ITERATION_COUNT; i++) {
allocateSeparately();
}
stopwatch.stop();
System.out.println("Separate allocation: " + stopwatch);
}
private int allocateSeparately() {
int[][] array = new int[50][];
for (int i = 0; i < array.length; i++) {
array[i] = new int[2];
}
return array[10][1];
}
}
I tested on 64 bit linux, these are the results with different 64 bit oracle java versions:
1.6.0_45-b06:
Separate allocation: 401.0 ms
Allocate at once: 1.673 s
1.7.0_45-b18
Separate allocation: 408.7 ms
Allocate at once: 1.448 s
1.8.0-ea-b115
Separate allocation: 380.0 ms
Allocate at once: 1.251 s
Just for curiosity, I tried it with OpenJDK 7 as well (where the difference is smaller):
Separate allocation: 424.3 ms
Allocate at once: 1.072 s
For me it's quite counter-intuitive, I would expect allocating at once to be faster.

Absolute unbelievable. A benchmark source might suffer from optimizations, gc and JIT, but this?
Looking at the java byte code instruction set:
anewarray (+ 2 bytes indirect class index) for arrays of object classes (a = address)
newarray (+ 1 byte for prinitive class) for arrays of primitive types
multianewarray (+ 2 bytes indirect class index) for multidimensional arrays
This leads one to suspect that multianewarray is suboptimal for primitive types.
Before looking further, I hope someone knows where we are misled.

The latter code's inner loop (with a newarray) is hit more times than the former code's multianewarray, so it probably hits C2 and gets subjected to escape analysis sooner. (Once that happens, the rows created by the latter code are allocated on the stack, which is faster than the heap and reduces the workload for the garbage collector.)
It's also possible that these JDK versions didn't actually do escape analysis on rows from a multianewarray, since a multidimensional array is more likely to exceed the size limit for a stack array.

What is the breakdown of the statistics in a Java heap dump

Given this heap dump
size no. of obj class
515313696 2380602 char[]
75476832 614571 * ConstMethodKlass
57412368 2392182 java.lang.String
44255544 614571 * MethodKlass
33836872 70371 * ConstantPoolKlass
28034704 70371 * InstanceKlassKlass
26834392 349363 java.lang.Object[]
25853848 256925 java.util.HashMap$Entry[]
24224240 496587 * SymbolKlass
19627024 117963 byte[]
18963232 61583 * ConstantPoolCacheKlass
18373920 120113 int[]
15239352 634973 java.util.HashMap$Entry
11789056 92102 ph.com.my.class.Person
And only 1 class is from my app, ph.com.my.class.Person. The class definition is follows:
public class Person {
private String f_name;
private String l_name;
}
In the heap dump, does the Person size (11789056) include the memory that the 2 string variables occupying? Or will the f_name and l_name be counted in the String class instead, in this case size 57412368?
UPDATED - added followup question:
So let's say each instance of:
f_name size is 30
l_name size is 20
Person size is 75
If there where 10 instances of Person, there will be
10 * (30+20) = 500
10 * 75 = 750
Will the 500 be counted in String or char[]? And subsequently, will 750 be counted in Person?

The size of an object in the heap dump is the number of bytes allocated as a block on the heap to hold that instance. It never includes the bytes of the whole graph reachable through the object. In general that could easily mean that the size of the object is the entire heap. So in your case it takes into account the two references, but not the String instances themselves. Note also that even the String size doesn't reflect the size of the represented string -- that's stored in a char[]. The char[] instances are shared between strings so the story isn't that simple.

Each count and size is the size of that object. If you used -histo instead of -histo:live this will be all the objects, even the ones which are not referenced.
Note: each String has a char[] and the JVM uses quite a few of these. The String size is the size of the object itself and not its char[]

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

memory allocation in collection of 1 million references in java - java

Related

What JVM runtime data area are included in ThreadMXBean.getThreadAllocatedBytes(long id)

why my single threaded java program is consuming more than max heap size?

Java uses more memory than anticipated

Performance of 2D array allocation

What is the breakdown of the statistics in a Java heap dump

Categories

Resources