In Java, empty HashMap space allocation - java

How can i tell how much space a pre-sized HashMap takes up before any elements are added? For example how do i determine how much memory the following takes up?
HashMap<String, Object> map = new HashMap<String, Object>(1000000);

In principle, you can:
calculate it by theory:
look at the implementation of HashMap to figure out what this method does.
look at the implementation of the VM to know how much space the individual created objects take.
measure it somehow.
Most of the other answers are about the second way, so I'll look at the first one (in OpenJDK source, 1.6.0_20).
The constructor uses a capacity that is the next power of two >= your initialCapacity parameter, thus 1048576 = 2^20 in our case.
It then creates an new Entry[capacity] and assigns it to the table variable. (Additionally it assigns some primitive variables).
So, we now have one quite small HashMap object (it contains only 3 ints, one float and one reference variable), and one quite big Entry[] object. This array needs space for their array elements (which are normal reference variables) and some metadata (size, class).
So, it comes down to how big a reference variable is. This depends on VM implementation - usually in 32-bit VMs it is 32 bit (= 4 bytes), in 64-bit VMs 64 bit (= 8 bytes).
So, basically on 32-bit VMs your array takes 4 MB, on 64-bit VMs it takes 8 MB, plus some tiny administration data.
If you then fill your HashTable with mappings, each mapping corresponds to a Entry object. This entry object consists of one int and three references, taking about 24 bytes on 32-bit VMs, maybe the double on 64-bit VMs. Thus your 1000000-mappings HashMap (assuming an load factor > 1) would take ~28 MB on 32-bit-VMs and ~56 MB on 64-bit VMs.
Additionally to the key and value objects themselves, of course.

You could check memory usage before and after creation of the variable. For example:
long preMemUsage = Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();
HashMap<String> map = new HashMap<String>(1000000);
long postMemUsage = Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();

The exact answer will depend on the version of Java you are using, the JVM vendor and the target platform, and is best determined by direct measurement, as described in other answers.
But as a simple estimate, the size is likely to be either ~4 * 2^20 or ~8 * 2^20 bytes, for a 32 bit or 64 bit jvm respectively.
Reasoning:
The Sun Java 1.6 implementation of HashMap has a fixed side top-level object and a table field that points to the array of references to hash chains.
In a newly created (empty) HashMap the references are all null and the array size is the next power of two larger that the supplied initialCapacity. (Yes ... I checked the source code.)
A reference occupies 4 bytes on a typical 32bit JVM and 8 bytes on a typical 64 bit JVM. Some 64 bit JVMs support compact references ("compressed oops"), but you need to set JVM options to enable this.
The top object has 5 fields including the table array reference, but this is a relatively small constant overhead.
The top object and the array have object header overheads, but these are constant and relatively small.
Thus the size of the table array dominates, and it is 2^20 (the next power of 2 greater than 1,000,000) multiplied by the size of a reference.
So, this tells you that setting a large initial capacity really does use a lot of memory. On the other hand, if the initial capacity is a good estimate of the map's capacity when fully populated, you will save significant amounts of time by setting it. (This avoids a number of cycles of reallocating the array and rebuilding of the hash chains.)

You could probably use a profiler like VisualVM and track memory use.
Have a look at this too: http://www.velocityreviews.com/forums/t148009-java-hashmap-size.html

I'd have a look at this article: http://www.javaworld.com/javaworld/javatips/jw-javatip130.html
In short, java does not have a C-style sizeof operator. You could use profiling tools, but IMO the above link gives the simplest solution.
Another piece of info that may be helpful: an empty java String consumes 40 bytes. One million of them would probably be at least 40MB...

I agree that a profiler is really the only way to tell. The other bit of relevant information is whether you're using a 32-bit or 64-bit JVM. The amount of overhead due to memory references (pointers) varies depending on that and whether you have compressed oops turned on. I've found that for smaller data sets the overhead of objects and pointers is significant.

In the latest version of Java 1.7 (I'm looking at 1.7.0_55) HashMap actually lazily instantiates its internal table. It's only instantiated when put() is called - see the private method "inflateTable()". So your HashMap, before you add anything to it at least, will occupy only the handful of bytes of object overhead and instance fields.

You should be able to use VisualVM (comes with JDK 6 or can be downloaded) to create a memory snapshot and inspect the allocated objects for their size.

Related

How precisely do Java arrays use memory in HotSpot (i.e. how much slop)?

C malloc implementations typically don't allocate the precise amount of memory requested but instead consume fixed-size runs of memory, e.g. with power-of-two sizes, so that an allocation of 1025 bytes actually takes a 2048-byte segment with 1023 bytes lost as slop.
Does HotSpot use a similar allocation mechanism for Java arrays? If so, what's the right way to allocate a Java array such that there's no slop? (E.g. should the array length be a power of two or maybe power of two minus some fixed amount of overhead?)
If you're asking about the language, the answer is: Its not specified (same as for C)
If you're asking about a specific implementation, check out that implementation. I believe for Hotspot its 8 bytes granularity; that is object sizes are rounded up to the next granularity boundary. If the question is about the heap size increase when there isn't enough free heap, then it depends on implementation, GC settings, heap size params and so on; making it impractical to answer precisely.
EDIT: Using a small reflection hack, accessing the sun.misc.Unsafe class (Oracle JRE only), object references can be converted to memory addresses; output the addresses of two consecutively allocated arrays to check for yourself.
And I basically asked the same question here: Determine the optimal size for array with respect to the JVM's memory granularity (Answers include an example of using the Unsafe class to check for object size)

Memory allocation declaring a field

How much memory Java allocates for declaring fields like private char letter; and private int size; at the moment of constucting the object containing these fields?
This depends on the implementation of the virtual machine. The spec specifies that a char primitive type has a value range of 16 Bit, but it does not specify how a virtual machine has to store an object on the heap.
There's no need for such a detailed spec, because VM's don't have to be able to exchange or serialize raw objects from the heap.
To respond to your clarification in a comment: Again, it depends on the implementation, but there a couple of good reasons to allocate the memory for all class attributes once at the time the object is "created". If we decided for lazy allocation, then we'd have to add mechanics to dynamically resize objects on the heap at runtime which is pretty expensive.
If we reserve all space right away at the beginning, then we never have to resize or relocate data on the heap, because the datastructures can never grow or shrink in size.
In the Oracle/Sun JVM, each object is allocated on an 8-byte boundary. So adding a field may not increase the amount of memory used. However as a guide here are the sizes of primitives
type typical size
byte, boolean 1 byte
char, short 2 bytes
int, float 4 bytes
long, double 8 bytes
Whether the JVM is 32-bit or 64-bit makes no differences to the size of a primitive but it does change the default size of a reference.
I don't know the specificities of the JVM,
but if that can help you the char primitive type uses 16-bit (Unicode character) to store the data, and int uses 32-bits
http://download.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
I guess you could test it by creating a very simple Java application and a very simple object.
Run the application without declaring fields and check how much memory it uses (Ctrl+Shift+Escape in Windows), and then re-run and check the difference when you do allocate these fields.
Fields in Java classes that store primitive types are initialised with default values when the object is created, so I would imagine the memory would be allocated then.
This is implementation dependent.
Early JVM implementations were closer to the class file format. In that case byte, short, char, int, float and references take up one slot; long and double two slots. So, effectively round size up to four bytes and that's how much memory it takes up in the object. Then the total for the object, including header, is often rounded up to 8 bytes for better memory alignment. For "compressed oops" (32 bit references on 64 bit platforms, where the bottom bits of the 64-bit address are always zero, allowing the reference to be shifted and more than 4 GB used whilst keeping references down to four bytes), there is strong pressure to align to bigger sizes.
But for the best part of a decade we have had 64-bit JVMs. That means more waste, including waste in terms of processor-memory bandwidth. So in modern implementations the object layout is compacted such that object uses as much memory as you would expect (plus header and alignment rounding).

Allocated Memory for Hashtable.put()

So I was reading Peter Norvig's IAQ (infrequently asked questions - link) and stumbled upon this:
You might be surprised to find that an
Object takes 16 bytes, or 4 words, in
the Sun JDK VM. This breaks down as
follows: There is a two-word header,
where one word is a pointer to the
object's class, and the other points
to the instance variables. Even though
Object has no instance variables, Java
still allocates one word for the
variables. Finally, there is a
"handle", which is another pointer to
the two-word header. Sun says that
this extra level of indirection makes
garbage collection simpler. (There
have been high performance Lisp and
Smalltalk garbage collectors that do
not use the extra level for at least
15 years. I have heard but have not
confirmed that the Microsoft JVM does
not have the extra level of
indirection.)
An empty new String() takes 40 bytes,
or 10 words: 3 words of pointer
overhead, 3 words for the instance
variables (the start index, end index,
and character array), and 4 words for
the empty char array. Creating a
substring of an existing string takes
"only" 6 words, because the char array
is shared. Putting an Integer key and
Integer value into a Hashtable takes
64 bytes (in addition to the four
bytes that were pre-allocated in the
Hashtable array): I'll let you work
out why.
So well I obviously tried, but I can't figure it out. In the following I only count words:
A Hashtable put creates one Hashtable$Entry: 3 (overhead) + 4 variables (3 references which I assume are 1 word + 1 int). I further assume that he means that the Integers are newly allocated (so not cached by the Integer class or already exist) which comes to 2* (3 [overhead] + 1 [1 int value]).
So in the end we end up with.. 15 words or 60bytes. So what I first thought was that the Entry as a inner class needs a reference to its outer object, but alas it's static so that doesn't make much sense (sure we have to store a pointer to the parent class, but I'd think that information is stored in the class header by the VM).
Just idle curiosity and I'm well aware that all this depends to a good bit on the actual JVM implementation (and on a 64bit version the results would be different), but still I don't like questions I can't answer :)
Edit: Just to make this a bit clearer: While I'm well aware that more compact structures can get us some performance benefits, I agree that in general worrying about a few bytes here or there is a waste of time. I surely wouldn't stop using a Hashtable just because of a few bytes overhead here or there just like I wouldn't use plain char arrays instead of Strings (or start using C). This is purely of academic interest to learn a bit more about the insides of Java/the JVM :)
The author appears to assume there is 3 Objects with 16 bytes overhead each and 2 32-bit references in the Map.Entry and 2 x 1 32-bit int values. This would total 64-bytes
This is flawed in that Sun/Oracle's JVM only allocates on 8-byte boundaries so that while technically an Integer occupies 20 bytes of memory, 24 bytes is used (the next multiple of 8)
Additionally many JVMs now use 64-bit references so the Map.Entry would use another 16 bytes.
This is all very inefficient, which is why you might use a class like TIntIntHashMap instead which use primitives.
However, usually it doesn't matter as memory is surprising cheap when you compare it to the cost of your time. If you work on server applications and you cost your company about $40/hour, you need to be saving about 10 MB every minute to save as much memory as you are costing. (Ideally you need to be saving much more than this) Saving 10 MB each and every minute is hard.
Memory is reusable, but your time isn't.

Why does creating a big Java array consume so much memory?

Why does the following line
Object[] objects = new Object[10000000];
result in a lot of memory (~40M) being used by the JVM? Is there any way to know the internal workings of the VM when allocating arrays?
Well, that allocates enough space for 10000000 references, as well as a small amount of overhead for the array object itself.
The actual size will depend on the VM - but it's surely not surprising that it's taking up a fair amount of memory... I'd expect at least 40MB, and probably 80MB on a 64-bit VM, unless it's using compressed oops for arrays.
Of course, if you populate the array with that many distinct objects, that will take much, much more memory... but the array itself still needs space just for the references.
What do you mean by "a lot of memory"? You allocating 10000000 pointers, each taking 4 bytes(on 32 bit machine) - this is about 40mb of memory.
You are creating ten million references to an object. A reference is at least 4 bytes; IIRC in Java it might be 8, but I'm unsure of that.
So with that one line you're creating 40 or 80 megabytes of data.
You are reserving space for ten million references. That is quite a bit.
It results in a lot of memory being used because it needs to allocate heap space for 10 million objects and their associated overhead.
To look into the internal workings of the JVM, you can check out its source code, as it is open source.
Your array has to hold 10 million object references, which on modern platforms are 64 bit (8 byte) pointers. Since it is allocated as a contiguous chunk of storage, it should take 80 million bytes. That's big in one sense, small compared to the likely amount of memory you have. Why does it bother you?
It creates an array with 10.000.000 reference pointers, all initialized with null.
What did you expect, saying this is "a lot"?
Further reading
Size of object references in Java
One of the principal reasons arrays are used so widely is that their elements can be accessed in constant time. This means that the time taken to access a[i] is the same for each index i. This is because the address of a[i] can be determined arithmetically by adding a suitable offset to the address of the head of the array. The reason is that space for the contents of an array is allocated as a contiguous block of memory.
According to this site, the memory usage for arrays is a 12 bytes header + 4 bytes per element. If you declare an empty array of Object holding 10M elements, then you have just about 40MB of memory used from the start. If you start filling that array with actually 10M object, then the size increases quite rapidly.
From this site, and I just tested it on my 64-bit machine, the size of a plain Object is about 31 bytes, so an array of 10M of Object is just about 12 bytes + (4 + 31 bytes) * 10M = 350 000 012 bytes (or 345.78 MB)
If your array is holding other type of objects, then the size will be even larger.
I would suggest you use some kind of random access file(s) to hold you data if you have to keep so much data inside your program. Or even use a database such as Apache Derby, which will also enable you to sort and filter your data, etc.
I may be behind the times but I understood from the book Practical Java that Vectors are more efficient and faster than Arrays. Is it possible to use a Vector instead of an array?

TreeMap memory usage

How can one calculate how much memory a Java TreeMap needs to handle each mapping?
I am doing an experiment with 128 threads, each dumping 2^17 longs in its own array.
All these 2^24 longs are then mapped to ints (TreeMap<Long,Integer>), each array reference is nulled before moving to the next.
That should amount to 128+64 MB for the keys+values. I am surprised to get OutOfMemoryError during the mapping with 512MB assigned to this VM.
You seem to assume that a Long, Integer key/value pair in a map occupies only 12 bytes of memory. That is wrong.
Even if you copy from a primitive long array, autoboxing will automatically create Long and Integer object instances as wrappers for the primitive values when you use them as map keys and values. The memory requirements for the object instances is VM implementation specific, but I think that Sun's VM lies in the range 32-48 bytes for these objects, with instances in a 64 bit VM being slightly larger. In addition, the map need additional object instances for each key/value pair to manage the internal data structures.
Each Long is at least 16 bytes, and each Integer is at least 12 16 bytes, due to the 8-byte object overhead and 8-byte alignment. On a 32-bit machine, each node is at least 24 32 bytes (object header, key, value, two children, and flags for balancing). That means at least 2^24 * (12+24+16) = 832MB.
Edit: It appears objects are 8-byte aligned, so bump the Integer to 16 bytes. Also, a tree node probably has another field for balancing the tree, so count 32 bytes for it. That brings us to a minimum of 1024MB.
Apperently from this Tree Map 4 Docs the default size of the tree is 64M. It will report an OutOfMemoryError if you exceeed that. If you don't specify a maximum size it will default to that.
Hope that helps
Bob
EDIT: Ignore this. Its all wrong.

Categories

Resources