TreeMap memory usage

TreeMap memory usage - java

How can one calculate how much memory a Java TreeMap needs to handle each mapping?
I am doing an experiment with 128 threads, each dumping 2^17 longs in its own array.
All these 2^24 longs are then mapped to ints (TreeMap<Long,Integer>), each array reference is nulled before moving to the next.
That should amount to 128+64 MB for the keys+values. I am surprised to get OutOfMemoryError during the mapping with 512MB assigned to this VM.

You seem to assume that a Long, Integer key/value pair in a map occupies only 12 bytes of memory. That is wrong.
Even if you copy from a primitive long array, autoboxing will automatically create Long and Integer object instances as wrappers for the primitive values when you use them as map keys and values. The memory requirements for the object instances is VM implementation specific, but I think that Sun's VM lies in the range 32-48 bytes for these objects, with instances in a 64 bit VM being slightly larger. In addition, the map need additional object instances for each key/value pair to manage the internal data structures.

Each Long is at least 16 bytes, and each Integer is at least 12 16 bytes, due to the 8-byte object overhead and 8-byte alignment. On a 32-bit machine, each node is at least 24 32 bytes (object header, key, value, two children, and flags for balancing). That means at least 2^24 * (12+24+16) = 832MB.
Edit: It appears objects are 8-byte aligned, so bump the Integer to 16 bytes. Also, a tree node probably has another field for balancing the tree, so count 32 bytes for it. That brings us to a minimum of 1024MB.

Apperently from this Tree Map 4 Docs the default size of the tree is 64M. It will report an OutOfMemoryError if you exceeed that. If you don't specify a maximum size it will default to that.
Hope that helps
Bob
EDIT: Ignore this. Its all wrong.

Related

Java - How can I create a variable with the required size in Java?

Consider this situation. I am having a thousand int variables in my program. We know that an int variable occupies 2 bytes of memory space in Java. So, the total amount of space occupied by my variables would be 2000 bytes. But, the problem is that, the value in every variable occupies only half of the space it has. So, the actual required space used would be 1000 bytes, which means that 1000 bytes goes waste. How can I address this problem? Is there a way to do address this problem in Java?

Your data fits in a byte, so just use byte instead of int (which is 4 bytes).

Checking the space required by values every time they are stored would mean creating an additional useless overhead.
If you know which size your values will be, it is up to you to properly chose their type while developing. If you don't know how much space your value might occupy, you have to choose the largest possible one.
During runtime, if a value is an int for example, this means it can possibly be between -2^31 and (2^31)-1. If you know this value will never be this size, you can use smaller sized primitives such as byte or short.
We could check and allocate the exact amount of memory required for each element in memory but it would take more time. On the contrary, using a fixed amount of memory takes up more space evidently but is faster, this is how Java works. Unfortunately, we can't do without a compromise.

An int in Java is 32 bit (4 bytes) long
The value held in the int does not affect the int size in memory, e.g. if your int has the value 0x55
So, in order to occupy 2000 bytes in memory, you will need 500 ints, if you create an int array, there will be overhead of the array itself, so it will take up more memory.

int data type is a 32-bit signed two's complement integer. And hence occupies 4 bytes of memory. For int:
Minimum value is - 2,147,483,648.(-2^31)
Maximum value is 2,147,483,647(inclusive).(2^31 -1)
If your values are not going to be so large then you can use either byte or short as per your requirement.
Byte data type is an 8-bit signed two's complement integer.
Minimum value is -128 (-2^7)
Maximum value is 127 (inclusive)(2^7 -1)
Short data type is a 16-bit signed two's complement integer.
Minimum value is -32,768 (-2^15)
Maximum value is 32,767 (inclusive) (2^15 -1)
You could also use a linked list if you really are not sure on how many elements are going to be in a large list.
The principal benefit of a linked list over a conventional array is that the list elements can easily be inserted or removed without reallocation or reorganization of the entire structure because the data items need not be stored continuously in memory or on disk, while an array has to be declared in the source code, before compiling and running the program. Linked lists allow insertion and removal of nodes at any point in the list, and can do so with a constant number of operations if the link previous to the link being added or removed is maintained during list traversal.
In general use a linked list for int data type or heavier data types as they actually incur a per-element overhead of at least 1 pointer size, so if your elements are small, you're actually worse off.

See the Oracle docs:
int: By default, the int data type is a 32-bit signed two's complement
integer
So it means that int occupies 4 bytes in memory.
And
byte: The byte data type is an 8-bit signed two's complement integer
So in your case you can change your datatype to byte instead of int.

Memory efficiency: HashMap versus Array

I was thinking about the following situation: I want to count the occurrence of characters in a string (for example for a permutation check).
One way to do it would be to allocate an array with 256 integers (I assume that the characters are UTF-8), to fill it with zeros and then to go through the string and increment the integers on the array positions corresponding to the int value of the chars.
However, for this approach, you would have to allocate a 256 array each time, even when the analyzed string is very short (and consequently uses only a small part of the array).
An other approach would be to use a Character to Integer HashTable and to store a number for each encountered char. This way, you only would have keys for chars that actually are in the string.
As my understanding of the HashTable is rather theoretic and I do not really know how it is implemented in Java my question is: Which of the two approaches would be more memory efficient?
Edit:
During the discussion of this question (thank you for your answers everyone) I did realize that I had a very fuzzy understanding of the nature of UTF-8. After some searching, I have found this great video that I want to share, in case someone has the same problem.

Ich wonder why you choose 256 as the length of your array when you assume that your String is UTF-8. In UTF-8 a character can be composed of up to 4 bytes which means quite a number of more characters than just 256.
Anyway: Using a HashTable/HashMap needs a huge memory overhead. First all your characters and integer need to be wrapped in an object (Integer/Character). And Integer consumes about 3x as much memory as an int. For arrays the difference can be even larger due to the optimizations java performs on arrays (e.g. the java stack works only in multiples of 4 byte, while in an array java allows smaller types such as a char to consume only 2 bytes).
Then the HashTable itself creates a memory overhead because it needs to maintain an array (which is usually not fully used) and linked lists to maintain all objects which generate the same hash.
Additionally access times will be dramatically faster for arrays. You save multiple method invocations (add, hashCode, iterator,...) and there exist a number of opcode in java byte code to make working with arrays more efficient.
Anyway. You question was:
Which of the two approaches would be more memory efficient?
And it is safe to say that arrays will be more memory efficient.
However you should make absolutely sure what your requirements are. Do you need more memory efficiency? (Could be true if you process large amounts of data or you are on a slow device (mobile devices?)) How important is readability of code? How about size of code? Reuseability?
And ist 256 really the correct size?

Without looking in the code I know that a HashMap requires, at minimum, a base object, a hashtable array, and individual objects for each hash entry. Generally an int value would have to be stored as an Integer object so that's more objects. Let's assume you have 30 unique characters:
32 bytes for the base object
256 bytes for a minimum-size hashtable array
32 bytes for each of the 30 table entries
16 bytes (if highly optimized) for each of 30 Integers
32 + 256 + 960 + 480 = 1728 bytes. That's for a minimal, non-fancy implementation.
The array of 256 ints would be about 1056 bytes.

I would use the array. From a performance aspect, you have guaranteed constant access. Better than the what a hash table can get you.
As it also only uses an constant amount of memory, I see no downside. The HashMap will most likely need more memory, even if you only store a few elements.
By the way, the memory footprint should not be a concern, as you will only need the data structure as long as you need it for counting. Then it will be garbage collected, anyway.

Well here are the facts.
HashMap uses an array for its table behind the scenes.
So if you were actually limited by finding a contiguous space in memory, HashMap's benefit is only that the array may be smaller.
HashMap is generic and therefore uses objects.
Objects take up extra space. As I remember, it's typically 8 or 16 bytes minimum depending on whether it's a 32- or 64-bit system. This means the HashMap may very well not be smaller, even if the number of characters in the String is small. HashMap will require 3 extra objects for each entry: an Entry, a Character and an Integer. HashMap also needs to store the int for the index locally whereas the array does not.
That's beyond that there will be some extra computation using the HashMap.
I would also say space optimization is not something you should worry about here. Either way, the memory footprint is actually very small.

Initialize an array of integers that represent the int value of a char, for example the int value of f is 102 which is its ascii value
http://www.asciitable.com/
char c = 'f';
int x = (int)c;
If you know the range of char's youre dealing with then it is easier.
For each occurance of char increment the index of that char in the array by one. This approach would be slow if you have to iterate and complicated if you are to sort but wont be memory intensive.
Just be aware when you sort you lose the indexes

Memory allocation declaring a field

How much memory Java allocates for declaring fields like private char letter; and private int size; at the moment of constucting the object containing these fields?

This depends on the implementation of the virtual machine. The spec specifies that a char primitive type has a value range of 16 Bit, but it does not specify how a virtual machine has to store an object on the heap.
There's no need for such a detailed spec, because VM's don't have to be able to exchange or serialize raw objects from the heap.
To respond to your clarification in a comment: Again, it depends on the implementation, but there a couple of good reasons to allocate the memory for all class attributes once at the time the object is "created". If we decided for lazy allocation, then we'd have to add mechanics to dynamically resize objects on the heap at runtime which is pretty expensive.
If we reserve all space right away at the beginning, then we never have to resize or relocate data on the heap, because the datastructures can never grow or shrink in size.

In the Oracle/Sun JVM, each object is allocated on an 8-byte boundary. So adding a field may not increase the amount of memory used. However as a guide here are the sizes of primitives
type typical size
byte, boolean 1 byte
char, short 2 bytes
int, float 4 bytes
long, double 8 bytes
Whether the JVM is 32-bit or 64-bit makes no differences to the size of a primitive but it does change the default size of a reference.

I don't know the specificities of the JVM,
but if that can help you the char primitive type uses 16-bit (Unicode character) to store the data, and int uses 32-bits
http://download.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
I guess you could test it by creating a very simple Java application and a very simple object.
Run the application without declaring fields and check how much memory it uses (Ctrl+Shift+Escape in Windows), and then re-run and check the difference when you do allocate these fields.

Fields in Java classes that store primitive types are initialised with default values when the object is created, so I would imagine the memory would be allocated then.

This is implementation dependent.
Early JVM implementations were closer to the class file format. In that case byte, short, char, int, float and references take up one slot; long and double two slots. So, effectively round size up to four bytes and that's how much memory it takes up in the object. Then the total for the object, including header, is often rounded up to 8 bytes for better memory alignment. For "compressed oops" (32 bit references on 64 bit platforms, where the bottom bits of the 64-bit address are always zero, allowing the reference to be shifted and more than 4 GB used whilst keeping references down to four bytes), there is strong pressure to align to bigger sizes.
But for the best part of a decade we have had 64-bit JVMs. That means more waste, including waste in terms of processor-memory bandwidth. So in modern implementations the object layout is compacted such that object uses as much memory as you would expect (plus header and alignment rounding).

In Java, empty HashMap space allocation

How can i tell how much space a pre-sized HashMap takes up before any elements are added? For example how do i determine how much memory the following takes up?
HashMap<String, Object> map = new HashMap<String, Object>(1000000);

In principle, you can:
calculate it by theory:
look at the implementation of HashMap to figure out what this method does.
look at the implementation of the VM to know how much space the individual created objects take.
measure it somehow.
Most of the other answers are about the second way, so I'll look at the first one (in OpenJDK source, 1.6.0_20).
The constructor uses a capacity that is the next power of two >= your initialCapacity parameter, thus 1048576 = 2^20 in our case.
It then creates an new Entry[capacity] and assigns it to the table variable. (Additionally it assigns some primitive variables).
So, we now have one quite small HashMap object (it contains only 3 ints, one float and one reference variable), and one quite big Entry[] object. This array needs space for their array elements (which are normal reference variables) and some metadata (size, class).
So, it comes down to how big a reference variable is. This depends on VM implementation - usually in 32-bit VMs it is 32 bit (= 4 bytes), in 64-bit VMs 64 bit (= 8 bytes).
So, basically on 32-bit VMs your array takes 4 MB, on 64-bit VMs it takes 8 MB, plus some tiny administration data.
If you then fill your HashTable with mappings, each mapping corresponds to a Entry object. This entry object consists of one int and three references, taking about 24 bytes on 32-bit VMs, maybe the double on 64-bit VMs. Thus your 1000000-mappings HashMap (assuming an load factor > 1) would take ~28 MB on 32-bit-VMs and ~56 MB on 64-bit VMs.
Additionally to the key and value objects themselves, of course.

You could check memory usage before and after creation of the variable. For example:
long preMemUsage = Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();
HashMap<String> map = new HashMap<String>(1000000);
long postMemUsage = Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();

The exact answer will depend on the version of Java you are using, the JVM vendor and the target platform, and is best determined by direct measurement, as described in other answers.
But as a simple estimate, the size is likely to be either ~4 * 2^20 or ~8 * 2^20 bytes, for a 32 bit or 64 bit jvm respectively.
Reasoning:
The Sun Java 1.6 implementation of HashMap has a fixed side top-level object and a table field that points to the array of references to hash chains.
In a newly created (empty) HashMap the references are all null and the array size is the next power of two larger that the supplied initialCapacity. (Yes ... I checked the source code.)
A reference occupies 4 bytes on a typical 32bit JVM and 8 bytes on a typical 64 bit JVM. Some 64 bit JVMs support compact references ("compressed oops"), but you need to set JVM options to enable this.
The top object has 5 fields including the table array reference, but this is a relatively small constant overhead.
The top object and the array have object header overheads, but these are constant and relatively small.
Thus the size of the table array dominates, and it is 2^20 (the next power of 2 greater than 1,000,000) multiplied by the size of a reference.
So, this tells you that setting a large initial capacity really does use a lot of memory. On the other hand, if the initial capacity is a good estimate of the map's capacity when fully populated, you will save significant amounts of time by setting it. (This avoids a number of cycles of reallocating the array and rebuilding of the hash chains.)

You could probably use a profiler like VisualVM and track memory use.
Have a look at this too: http://www.velocityreviews.com/forums/t148009-java-hashmap-size.html

I'd have a look at this article: http://www.javaworld.com/javaworld/javatips/jw-javatip130.html
In short, java does not have a C-style sizeof operator. You could use profiling tools, but IMO the above link gives the simplest solution.
Another piece of info that may be helpful: an empty java String consumes 40 bytes. One million of them would probably be at least 40MB...

I agree that a profiler is really the only way to tell. The other bit of relevant information is whether you're using a 32-bit or 64-bit JVM. The amount of overhead due to memory references (pointers) varies depending on that and whether you have compressed oops turned on. I've found that for smaller data sets the overhead of objects and pointers is significant.

In the latest version of Java 1.7 (I'm looking at 1.7.0_55) HashMap actually lazily instantiates its internal table. It's only instantiated when put() is called - see the private method "inflateTable()". So your HashMap, before you add anything to it at least, will occupy only the handful of bytes of object overhead and instance fields.

You should be able to use VisualVM (comes with JDK 6 or can be downloaded) to create a memory snapshot and inspect the allocated objects for their size.

How much memory does a Hashtable use?

In Java, if I create a Hashtable<K, V> and put N elements in it, how much memory will it occupy? If it's implementation dependent, what would be a good "guess"?

Edit; Oh geez, I'm an idiot, I gave info for HashMap, not HashTable. However, after checking, the implementations are identical for memory purposes.
This is dependent on your VM's internal memory setup (packing of items, 32 bit or 64 bit pointers, and word alignment/size) and is not specified by java.
Basic info on estimating memory use can be found here.
You can estimate it like so:
On 32-bit VMs, a pointer is 4 bytes, on 64-bit VMs, it is 8 bytes.
Object overhead is 8 bytes of memory (for an empty object, containing nothing)
Objects are padded to a size that is a multiple of 8 bytes (ugh).
There is a small, constant overhead for each hashmap: one float, 3 ints, plus object overhead.
There is an array of slots, some of which will have entries, some of which will be reserved for new ones. The ratio of filled slots to total slots is NO MORE THAN the specified load factor in the constructor.
The slot array requires one object overhead, plus one int for size, plus one pointer for every slot, to indicate the object stored.
The number of slots is generally 1.3 to 2 times more than the number of stored mappings, at default load factor of 0.75, but may be less than this, depending on hash collisions.
Every stored mapping requires an entry object. This requires one object overhead, 3 pointers, plus the stored key and value objects, plus an integer.
So, putting it together (for 32/64 bit Sun HotSpot JVM):
HashMap needs 24 bytes (itself, primtive fields) + 12 bytes (slot array constant) + 4 or 8 bytes per slot + 24/40 bytes per entry + key object size + value object size + padding each object to multiple of 8 bytes
OR, roughly (at most default settings, not guaranteed to be precise):
On 32-bit JVM: 36 bytes + 32 bytes/mapping + keys & values
On 64-bit JVM: 36 bytes + 56 bytes/mapping + keys & values
Note: this needs more checking, it might need 12 bytes for object overhead on 64-bit VM.
I'm not sure about nulls -- pointers for nulls may be compressed somehow.

It is hard to estimate. I would read this first:
http://www.codeinstructions.com/2008/12/java-objects-memory-structure.html
Just use the sunjdk tools to figure out the size of K, V and
jmap -histo [pid]
num #instances #bytes class name
1: 126170 19671768 MyKClass
2: 126170 14392544 MyVClass
3: 1 200000 MyHashtable
Also you may want to use HashMap instead of Hashtable if you do not need synchronization.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.