How much memory does a Hashtable use?

How much memory does a Hashtable use? - java

In Java, if I create a Hashtable<K, V> and put N elements in it, how much memory will it occupy? If it's implementation dependent, what would be a good "guess"?

Edit; Oh geez, I'm an idiot, I gave info for HashMap, not HashTable. However, after checking, the implementations are identical for memory purposes.
This is dependent on your VM's internal memory setup (packing of items, 32 bit or 64 bit pointers, and word alignment/size) and is not specified by java.
Basic info on estimating memory use can be found here.
You can estimate it like so:
On 32-bit VMs, a pointer is 4 bytes, on 64-bit VMs, it is 8 bytes.
Object overhead is 8 bytes of memory (for an empty object, containing nothing)
Objects are padded to a size that is a multiple of 8 bytes (ugh).
There is a small, constant overhead for each hashmap: one float, 3 ints, plus object overhead.
There is an array of slots, some of which will have entries, some of which will be reserved for new ones. The ratio of filled slots to total slots is NO MORE THAN the specified load factor in the constructor.
The slot array requires one object overhead, plus one int for size, plus one pointer for every slot, to indicate the object stored.
The number of slots is generally 1.3 to 2 times more than the number of stored mappings, at default load factor of 0.75, but may be less than this, depending on hash collisions.
Every stored mapping requires an entry object. This requires one object overhead, 3 pointers, plus the stored key and value objects, plus an integer.
So, putting it together (for 32/64 bit Sun HotSpot JVM):
HashMap needs 24 bytes (itself, primtive fields) + 12 bytes (slot array constant) + 4 or 8 bytes per slot + 24/40 bytes per entry + key object size + value object size + padding each object to multiple of 8 bytes
OR, roughly (at most default settings, not guaranteed to be precise):
On 32-bit JVM: 36 bytes + 32 bytes/mapping + keys & values
On 64-bit JVM: 36 bytes + 56 bytes/mapping + keys & values
Note: this needs more checking, it might need 12 bytes for object overhead on 64-bit VM.
I'm not sure about nulls -- pointers for nulls may be compressed somehow.

It is hard to estimate. I would read this first:
http://www.codeinstructions.com/2008/12/java-objects-memory-structure.html
Just use the sunjdk tools to figure out the size of K, V and
jmap -histo [pid]
num #instances #bytes class name
1: 126170 19671768 MyKClass
2: 126170 14392544 MyVClass
3: 1 200000 MyHashtable
Also you may want to use HashMap instead of Hashtable if you do not need synchronization.

Related

Loading 26MB text data from database consume JVM heap of 258MB

An app (Spring, JPA Hibernate, Sybase 12, Webapp) when run locally on startup consumes 40MB of the 256MB heap space based on VisualVM. When I trigger a search that returns 70,000+ rows (text data no blobs) the heap space graph shoots up to 256MB and throws out of memory. I have resolved this by using setMaxResults(limit). However, when I queried the same data, copy-pasted to a text file and saved to the filesystem, I can see that the size is only 26MB worth of text.
So in effect, 216MB(from 256-40) is consumed by loading a 26MB amount of text from the databases, who is consuming the 190MB by the time out of memory occurs? Perhaps it would be the frameworks, but I don't see how it can consume more than the actual data being loaded...
**note again that I resolved this with the setMaxResults(limit), my question is NOT what to do but rather why, for educational purposes.

Some things to consider:
Your operating system probably uses an 8bit per character encoding to store the text file. Java strings internally are all encoded at 16 bits per character, double the space right there.
Numbers with only a few digits will be smaller encoded as text than numbers. e.g., '1' is a one byte character in your text file, but a long with the value 1 is eight times that size in memory.
There will be duplication from hibernate taking values out of the SQL result set and mapping it onto your java objects. It may need to wrap/translate the contents of the result set into the types you defined on your mapping.
If your data-per-entity is actually small with a large number of entities, then the ratio of object overhead size to data size will obviously be high.
If you have small pieces of data in collections, the size of the collection can add up quick relative to the data. In extreme example, if you have LinkedList of one or two character strings, that's 192bits consumed just by pointers for every 16-32 bits of actual data. In an array list it would still be 64 bits for the pointer to point to 16-32 bits of data. (assuming 64 bit OS of course.)
Every object you load in hibernate is "tracked" for dirty checking in what's called the L1 cache. There can indeed quite a bit of overhead to the internal data structures and instrumentation used to do this relative to data size for large numbers of entities with small amounts of data.
--
So the 26MB of data is already 52MB of data in memory in java, assuming it is all strings, no numbers, no dates, it will be bigger otherwise.
And then if it's split into many small pieces, 700,000 small strings rather than 1,000 really long ones, it is totally reasonable for the size of data structure overhead to be triple the size of the actual data, pushing you over 200MB easily.

All sorts of things.
Let's consider for example that your rows have 10 text columns, which are represented as a simple Java Bean with 10 String fields.
A String has 4 fields: a char[], and 3 ints.
A String is descendant from Object, which has 1 int, and a reference to its class.
On a 64bit JVM, those references could well be 8 bytes (but not necessarily, but we'll stick with that for the sake of argument).
A 10 character string will have a char[10], and the 3 ints, which are 4 bytes each.
The char[10] is a pointer to an array. An array has to track its length, which is likely another 4 bytes, and it is also an Object (thus the class pointer and another int) plus the data. But characters in Java are represented as UTF-16 internally, 2 bytes per character. So, the actual array for 10 characters takes 24 bytes. And the reference to that array is a pointer.
So, a single String instance is: 8 + 4 for the Object, 8 + 4 + 4 + 4 for the String itself, and 8 + 4 + 20 for the actual data, or 62 bytes.
Your bean has 10 String fields, plus extends Object, so 8 + 4 + (10 * 8).
So, a single row from your database, for 100 chars of text, is 8 + 4 + (10 * 8) + (10 * 62) which equals 712 bytes.
These are not perfect numbers, I can't speak specifically to how arrays are stored, and the object references may well not be 8 bytes on a 64b JVM.
But it gives you some idea of the overhead involved. And this is just for your raw data. If you have those rows stored in an ArrayList, well, there's 70,000 * 8 just to point to your objects -- 560K for just the structure.

How much space does an array occupy?

If I create 10 integers and an integer array of 10, will there be any difference in total space occupied?
I have to create a boolean array of millions of records, so I want to understand how much space will be taken by array itself.

An array of integers is represented as block of memory to hold the integers, and an object header. The object header typically takes 3 32bit words for a 32 bit JVM, but this is platform dependent. (The header contains some flag bits, a reference to a class descriptor, space for primitive lock information, and the length of the actual array. Plus padding.)
So an array of 10 ints probably takes in the region of 13 * 4 bytes.
In the case on an Integer[], each Integer object has a 2 word header and a 1 word field containing the actual value. And you also need to add in padding, and 1 word (or 1 to 2 words on a 64-bit JVM) for the reference. That is typically 5 words or 20 bytes per element of the array ... unless some Integer objects appear in multiple places in the array.
Notes:
The number of words actually used for a reference on a 64 bit JVM depends on whether "compressed oops" are used.
On some JVMs, heap nodes are allocated in multiples of 16 bytes ... which inflates space usage (e.g. the padding mentioned above).
If you take the identity hashcode of an object and it survives the next garbage collection, its size gets inflated by at least 4 bytes to cache the hashcode value.
These numbers are all version and vendor specific, in addition to the sources of variability enumerated above.

Some rough lower bounds calculations:
Each int takes up four bytes. = 40 bytes for ten
An int array takes up four bytes for each component plus four bytes to store the length plus another four bytes to store the reference to it. = 48 bytes (+ maybe some padding to align all objects at 8 byte boundaries)
An Integer takes up at least 8 bytes, plus the another four bytes to store the reference to it. = at least 120 for ten
An Integer array takes up at least the 120 bytes for the ten Integers plus four bytes for the length, and then maybe some padding for alignment. Plus four bytes to store the reference to it. (#Marko reports that he even measured about 28 bytes per slot, so that would be 280 bytes for an array of ten).

In java you have both Integer and int. Supposing you are referring to int , an array of ints is considered an object and objects have metadata so an array of 10 ints will occupy more than 10 int variables

What you can do is measure:
public static void main(String[] args) {
final long startMem = measure();
final boolean[] bs = new boolean[1000000];
System.out.println(measure() - startMem);
bs.hashCode();
}
private static long measure() {
final Runtime rt = Runtime.getRuntime();
rt.gc();
try { Thread.sleep(20); } catch (InterruptedException e) {}
rt.gc();
return rt.totalMemory() - rt.freeMemory();
}
Of course, this goes with the standard disclaimer: gc() has no particular guarantees, so repeat several times to see if you are getting consistent results. On my machine the answer is one byte per boolean.

In light of your comment it will not make much difference if you used an array. Array will use a negligible amount of memory for its functionality itself. All other memory will be used by the stored objects.
EDIT: What you need to understand is that the difference between Boolean wrapper and boolean primitive type. Wrapper types will usually take up more space than the primitives. So for missions of records try to go with the primitives.
Another thing to keep in mind when dealing of missions of record as you said is Java Autoboxing. The performance hit can be significant if you unintentionally use this in a function that traverses the whole array.

It needn't reflect poorly on the teacher / interviewer.
How much you care about the size and alignment of variables in memory depends on how performant you need your code to be. It matters a lot if your software processes transactions (EFT / stock market) for example.
The size, alignment, and packing of your variables in memory can influence CPU cache hits/misses, which can influence the performance of your code by up to a factor of 100.
It's not a bad thing to know what's happening at a low level, as long as you use performance boosting tricks responsibly.
For example, I came to this thread because I needed to know the answer to exactly this question, so that I can size my arrays of primitives to fill an integer multiple of CPU cache lines because I need the code that is performing calculations over those arrays of primitives to execute quickly because I have a finite window in which I need my calculations to be ready for the consumer of the result.

In terms of RAM space, there is no real difference

If you use an array you have 11 Objects, 10 integers and the array, plus Arrays have other metadata inside. So using an array will take more memory space.
Now for real. This kind of question actually comes up in job interviews and exams, and that shows you what kind of interviewer or teacher you have... with so many layers of abstraction working down there in the VM and in the OS itself, what is the point on thinking on this stuff? Micro-optimizing memory...!

I mean if i create 10 integers and integer array of 10, will there be
any difference in total space occupied.
(integer array of 10) = (10 integers) + 1 integer
The last "+1 integer" is for index of array ( arrays can hold 2,147,483,647 amount of data, which is an integer). That means when you declare an array, say:
int[] nums = new int[10];
you actually reserve 11 int space from memory. 10 for array elements and +1 for array itself.

Allocated Memory for Hashtable.put()

So I was reading Peter Norvig's IAQ (infrequently asked questions - link) and stumbled upon this:
You might be surprised to find that an
Object takes 16 bytes, or 4 words, in
the Sun JDK VM. This breaks down as
follows: There is a two-word header,
where one word is a pointer to the
object's class, and the other points
to the instance variables. Even though
Object has no instance variables, Java
still allocates one word for the
variables. Finally, there is a
"handle", which is another pointer to
the two-word header. Sun says that
this extra level of indirection makes
garbage collection simpler. (There
have been high performance Lisp and
Smalltalk garbage collectors that do
not use the extra level for at least
15 years. I have heard but have not
confirmed that the Microsoft JVM does
not have the extra level of
indirection.)
An empty new String() takes 40 bytes,
or 10 words: 3 words of pointer
overhead, 3 words for the instance
variables (the start index, end index,
and character array), and 4 words for
the empty char array. Creating a
substring of an existing string takes
"only" 6 words, because the char array
is shared. Putting an Integer key and
Integer value into a Hashtable takes
64 bytes (in addition to the four
bytes that were pre-allocated in the
Hashtable array): I'll let you work
out why.
So well I obviously tried, but I can't figure it out. In the following I only count words:
A Hashtable put creates one Hashtable$Entry: 3 (overhead) + 4 variables (3 references which I assume are 1 word + 1 int). I further assume that he means that the Integers are newly allocated (so not cached by the Integer class or already exist) which comes to 2* (3 [overhead] + 1 [1 int value]).
So in the end we end up with.. 15 words or 60bytes. So what I first thought was that the Entry as a inner class needs a reference to its outer object, but alas it's static so that doesn't make much sense (sure we have to store a pointer to the parent class, but I'd think that information is stored in the class header by the VM).
Just idle curiosity and I'm well aware that all this depends to a good bit on the actual JVM implementation (and on a 64bit version the results would be different), but still I don't like questions I can't answer :)
Edit: Just to make this a bit clearer: While I'm well aware that more compact structures can get us some performance benefits, I agree that in general worrying about a few bytes here or there is a waste of time. I surely wouldn't stop using a Hashtable just because of a few bytes overhead here or there just like I wouldn't use plain char arrays instead of Strings (or start using C). This is purely of academic interest to learn a bit more about the insides of Java/the JVM :)

The author appears to assume there is 3 Objects with 16 bytes overhead each and 2 32-bit references in the Map.Entry and 2 x 1 32-bit int values. This would total 64-bytes
This is flawed in that Sun/Oracle's JVM only allocates on 8-byte boundaries so that while technically an Integer occupies 20 bytes of memory, 24 bytes is used (the next multiple of 8)
Additionally many JVMs now use 64-bit references so the Map.Entry would use another 16 bytes.
This is all very inefficient, which is why you might use a class like TIntIntHashMap instead which use primitives.
However, usually it doesn't matter as memory is surprising cheap when you compare it to the cost of your time. If you work on server applications and you cost your company about $40/hour, you need to be saving about 10 MB every minute to save as much memory as you are costing. (Ideally you need to be saving much more than this) Saving 10 MB each and every minute is hard.
Memory is reusable, but your time isn't.

In Java, empty HashMap space allocation

How can i tell how much space a pre-sized HashMap takes up before any elements are added? For example how do i determine how much memory the following takes up?
HashMap<String, Object> map = new HashMap<String, Object>(1000000);

In principle, you can:
calculate it by theory:
look at the implementation of HashMap to figure out what this method does.
look at the implementation of the VM to know how much space the individual created objects take.
measure it somehow.
Most of the other answers are about the second way, so I'll look at the first one (in OpenJDK source, 1.6.0_20).
The constructor uses a capacity that is the next power of two >= your initialCapacity parameter, thus 1048576 = 2^20 in our case.
It then creates an new Entry[capacity] and assigns it to the table variable. (Additionally it assigns some primitive variables).
So, we now have one quite small HashMap object (it contains only 3 ints, one float and one reference variable), and one quite big Entry[] object. This array needs space for their array elements (which are normal reference variables) and some metadata (size, class).
So, it comes down to how big a reference variable is. This depends on VM implementation - usually in 32-bit VMs it is 32 bit (= 4 bytes), in 64-bit VMs 64 bit (= 8 bytes).
So, basically on 32-bit VMs your array takes 4 MB, on 64-bit VMs it takes 8 MB, plus some tiny administration data.
If you then fill your HashTable with mappings, each mapping corresponds to a Entry object. This entry object consists of one int and three references, taking about 24 bytes on 32-bit VMs, maybe the double on 64-bit VMs. Thus your 1000000-mappings HashMap (assuming an load factor > 1) would take ~28 MB on 32-bit-VMs and ~56 MB on 64-bit VMs.
Additionally to the key and value objects themselves, of course.

You could check memory usage before and after creation of the variable. For example:
long preMemUsage = Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();
HashMap<String> map = new HashMap<String>(1000000);
long postMemUsage = Runtime.getRuntime().totalMemory() -
Runtime.getRuntime().freeMemory();

The exact answer will depend on the version of Java you are using, the JVM vendor and the target platform, and is best determined by direct measurement, as described in other answers.
But as a simple estimate, the size is likely to be either ~4 * 2^20 or ~8 * 2^20 bytes, for a 32 bit or 64 bit jvm respectively.
Reasoning:
The Sun Java 1.6 implementation of HashMap has a fixed side top-level object and a table field that points to the array of references to hash chains.
In a newly created (empty) HashMap the references are all null and the array size is the next power of two larger that the supplied initialCapacity. (Yes ... I checked the source code.)
A reference occupies 4 bytes on a typical 32bit JVM and 8 bytes on a typical 64 bit JVM. Some 64 bit JVMs support compact references ("compressed oops"), but you need to set JVM options to enable this.
The top object has 5 fields including the table array reference, but this is a relatively small constant overhead.
The top object and the array have object header overheads, but these are constant and relatively small.
Thus the size of the table array dominates, and it is 2^20 (the next power of 2 greater than 1,000,000) multiplied by the size of a reference.
So, this tells you that setting a large initial capacity really does use a lot of memory. On the other hand, if the initial capacity is a good estimate of the map's capacity when fully populated, you will save significant amounts of time by setting it. (This avoids a number of cycles of reallocating the array and rebuilding of the hash chains.)

You could probably use a profiler like VisualVM and track memory use.
Have a look at this too: http://www.velocityreviews.com/forums/t148009-java-hashmap-size.html

I'd have a look at this article: http://www.javaworld.com/javaworld/javatips/jw-javatip130.html
In short, java does not have a C-style sizeof operator. You could use profiling tools, but IMO the above link gives the simplest solution.
Another piece of info that may be helpful: an empty java String consumes 40 bytes. One million of them would probably be at least 40MB...

I agree that a profiler is really the only way to tell. The other bit of relevant information is whether you're using a 32-bit or 64-bit JVM. The amount of overhead due to memory references (pointers) varies depending on that and whether you have compressed oops turned on. I've found that for smaller data sets the overhead of objects and pointers is significant.

In the latest version of Java 1.7 (I'm looking at 1.7.0_55) HashMap actually lazily instantiates its internal table. It's only instantiated when put() is called - see the private method "inflateTable()". So your HashMap, before you add anything to it at least, will occupy only the handful of bytes of object overhead and instance fields.

You should be able to use VisualVM (comes with JDK 6 or can be downloaded) to create a memory snapshot and inspect the allocated objects for their size.

TreeMap memory usage

How can one calculate how much memory a Java TreeMap needs to handle each mapping?
I am doing an experiment with 128 threads, each dumping 2^17 longs in its own array.
All these 2^24 longs are then mapped to ints (TreeMap<Long,Integer>), each array reference is nulled before moving to the next.
That should amount to 128+64 MB for the keys+values. I am surprised to get OutOfMemoryError during the mapping with 512MB assigned to this VM.

You seem to assume that a Long, Integer key/value pair in a map occupies only 12 bytes of memory. That is wrong.
Even if you copy from a primitive long array, autoboxing will automatically create Long and Integer object instances as wrappers for the primitive values when you use them as map keys and values. The memory requirements for the object instances is VM implementation specific, but I think that Sun's VM lies in the range 32-48 bytes for these objects, with instances in a 64 bit VM being slightly larger. In addition, the map need additional object instances for each key/value pair to manage the internal data structures.

Each Long is at least 16 bytes, and each Integer is at least 12 16 bytes, due to the 8-byte object overhead and 8-byte alignment. On a 32-bit machine, each node is at least 24 32 bytes (object header, key, value, two children, and flags for balancing). That means at least 2^24 * (12+24+16) = 832MB.
Edit: It appears objects are 8-byte aligned, so bump the Integer to 16 bytes. Also, a tree node probably has another field for balancing the tree, so count 32 bytes for it. That brings us to a minimum of 1024MB.

Apperently from this Tree Map 4 Docs the default size of the tree is 64M. It will report an OutOfMemoryError if you exceeed that. If you don't specify a maximum size it will default to that.
Hope that helps
Bob
EDIT: Ignore this. Its all wrong.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.