If I create 10 integers and an integer array of 10, will there be any difference in total space occupied?
I have to create a boolean array of millions of records, so I want to understand how much space will be taken by array itself.
An array of integers is represented as block of memory to hold the integers, and an object header. The object header typically takes 3 32bit words for a 32 bit JVM, but this is platform dependent. (The header contains some flag bits, a reference to a class descriptor, space for primitive lock information, and the length of the actual array. Plus padding.)
So an array of 10 ints probably takes in the region of 13 * 4 bytes.
In the case on an Integer[], each Integer object has a 2 word header and a 1 word field containing the actual value. And you also need to add in padding, and 1 word (or 1 to 2 words on a 64-bit JVM) for the reference. That is typically 5 words or 20 bytes per element of the array ... unless some Integer objects appear in multiple places in the array.
Notes:
The number of words actually used for a reference on a 64 bit JVM depends on whether "compressed oops" are used.
On some JVMs, heap nodes are allocated in multiples of 16 bytes ... which inflates space usage (e.g. the padding mentioned above).
If you take the identity hashcode of an object and it survives the next garbage collection, its size gets inflated by at least 4 bytes to cache the hashcode value.
These numbers are all version and vendor specific, in addition to the sources of variability enumerated above.
Some rough lower bounds calculations:
Each int takes up four bytes. = 40 bytes for ten
An int array takes up four bytes for each component plus four bytes to store the length plus another four bytes to store the reference to it. = 48 bytes (+ maybe some padding to align all objects at 8 byte boundaries)
An Integer takes up at least 8 bytes, plus the another four bytes to store the reference to it. = at least 120 for ten
An Integer array takes up at least the 120 bytes for the ten Integers plus four bytes for the length, and then maybe some padding for alignment. Plus four bytes to store the reference to it. (#Marko reports that he even measured about 28 bytes per slot, so that would be 280 bytes for an array of ten).
In java you have both Integer and int. Supposing you are referring to int , an array of ints is considered an object and objects have metadata so an array of 10 ints will occupy more than 10 int variables
What you can do is measure:
public static void main(String[] args) {
final long startMem = measure();
final boolean[] bs = new boolean[1000000];
System.out.println(measure() - startMem);
bs.hashCode();
}
private static long measure() {
final Runtime rt = Runtime.getRuntime();
rt.gc();
try { Thread.sleep(20); } catch (InterruptedException e) {}
rt.gc();
return rt.totalMemory() - rt.freeMemory();
}
Of course, this goes with the standard disclaimer: gc() has no particular guarantees, so repeat several times to see if you are getting consistent results. On my machine the answer is one byte per boolean.
In light of your comment it will not make much difference if you used an array. Array will use a negligible amount of memory for its functionality itself. All other memory will be used by the stored objects.
EDIT: What you need to understand is that the difference between Boolean wrapper and boolean primitive type. Wrapper types will usually take up more space than the primitives. So for missions of records try to go with the primitives.
Another thing to keep in mind when dealing of missions of record as you said is Java Autoboxing. The performance hit can be significant if you unintentionally use this in a function that traverses the whole array.
It needn't reflect poorly on the teacher / interviewer.
How much you care about the size and alignment of variables in memory depends on how performant you need your code to be. It matters a lot if your software processes transactions (EFT / stock market) for example.
The size, alignment, and packing of your variables in memory can influence CPU cache hits/misses, which can influence the performance of your code by up to a factor of 100.
It's not a bad thing to know what's happening at a low level, as long as you use performance boosting tricks responsibly.
For example, I came to this thread because I needed to know the answer to exactly this question, so that I can size my arrays of primitives to fill an integer multiple of CPU cache lines because I need the code that is performing calculations over those arrays of primitives to execute quickly because I have a finite window in which I need my calculations to be ready for the consumer of the result.
In terms of RAM space, there is no real difference
If you use an array you have 11 Objects, 10 integers and the array, plus Arrays have other metadata inside. So using an array will take more memory space.
Now for real. This kind of question actually comes up in job interviews and exams, and that shows you what kind of interviewer or teacher you have... with so many layers of abstraction working down there in the VM and in the OS itself, what is the point on thinking on this stuff? Micro-optimizing memory...!
I mean if i create 10 integers and integer array of 10, will there be
any difference in total space occupied.
(integer array of 10) = (10 integers) + 1 integer
The last "+1 integer" is for index of array ( arrays can hold 2,147,483,647 amount of data, which is an integer). That means when you declare an array, say:
int[] nums = new int[10];
you actually reserve 11 int space from memory. 10 for array elements and +1 for array itself.
Related
I am studying Java 8 documentation for ArrayList. I got that maximum array size is defined as Integer.MAX_VALUE - 8 means 2^31 – 8 = 2 147 483 639. Then I have focused on why 8 is subtracted or why not less than 8 or more than 8 is subtracted?
/**
* The maximum size of array to allocate.
* Some VMs reserve some header words in an array.
* Attempts to allocate larger arrays may result in
* OutOfMemoryError: Requested array size exceeds VM limit
*/
private static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
I got some related answers but not fulfilling my thrust.
Do Java arrays have a maximum size?
How many data a list can hold at the maximum
Why I can't create an array with large size?
Some people given some logic that as per documentation "Some VMs reserve some header words in an array". So for header words, 8 is subtracted. But on that case, if header words need more than 8, then what will be the answer?
Please clarify me on that basis. Advance thanks for your cooperation.
Read the above article about Java Memory management, which clearly states
I think this applies to ArrayList as it is the Resizable array implemenation.
Anatomy of a Java array object
The shape and structure of an array object, such as an array of int
values, is similar to that of a standard Java object. The primary
difference is that the array object has an additional piece of
metadata that denotes the array's size. An array object's metadata,
then, consists of: Class : A pointer to the class information, which
describes the object type. In the case of an array of int fields, this
is a pointer to the int[] class.
Flags : A collection of flags that describe the state of the object,
including the hash code for the object if it has one, and the shape of
the object (that is, whether or not the object is an array).
Lock : The synchronization information for the object — that is,
whether the object is currently synchronized.
Size : The size of the array.
max size
2^31 = 2,147,483,648
as the Array it self needs 8 bytes to stores the size
2,147,483,648
so
2^31 -8 (for storing size ),
so maximum array size is defined as Integer.MAX_VALUE - 8
The size of object header can not exceed 8 byte.
For HotSpot:
The object header consists of a mark word and a klass pointer.
The mark word has word size (4 byte on 32 bit architectures, 8 byte on 64 bit architectures) and
the klass pointer has word size on 32 bit architectures. On 64 bit architectures the klass pointer either has word size, but can also have 4 byte if the heap addresses can be encoded in these 4 bytes.
This optimization is called "compressed oops" and you can also control it with the option UseCompressedOops.
What is in java object header
The value is a worst-case scenario. Note the comment:
Attempts to allocate larger arrays may result in OutOfMemoryError
It doesn't say will, just may. If you stay below this value, you should have no issue (as long as memory is available, of course).
You may want to look at the answers to this question for more information:
Why I can't create an array with large size?
integer max size is :
2^31 - 1 = 2,147,483,648 - 1
Integer.java:
#Native public static final int MAX_VALUE = 0x7fffffff;
Consider this situation. I am having a thousand int variables in my program. We know that an int variable occupies 2 bytes of memory space in Java. So, the total amount of space occupied by my variables would be 2000 bytes. But, the problem is that, the value in every variable occupies only half of the space it has. So, the actual required space used would be 1000 bytes, which means that 1000 bytes goes waste. How can I address this problem? Is there a way to do address this problem in Java?
Your data fits in a byte, so just use byte instead of int (which is 4 bytes).
Checking the space required by values every time they are stored would mean creating an additional useless overhead.
If you know which size your values will be, it is up to you to properly chose their type while developing. If you don't know how much space your value might occupy, you have to choose the largest possible one.
During runtime, if a value is an int for example, this means it can possibly be between -2^31 and (2^31)-1. If you know this value will never be this size, you can use smaller sized primitives such as byte or short.
We could check and allocate the exact amount of memory required for each element in memory but it would take more time. On the contrary, using a fixed amount of memory takes up more space evidently but is faster, this is how Java works. Unfortunately, we can't do without a compromise.
An int in Java is 32 bit (4 bytes) long
The value held in the int does not affect the int size in memory, e.g. if your int has the value 0x55
So, in order to occupy 2000 bytes in memory, you will need 500 ints, if you create an int array, there will be overhead of the array itself, so it will take up more memory.
int data type is a 32-bit signed two's complement integer. And hence occupies 4 bytes of memory. For int:
Minimum value is - 2,147,483,648.(-2^31)
Maximum value is 2,147,483,647(inclusive).(2^31 -1)
If your values are not going to be so large then you can use either byte or short as per your requirement.
Byte data type is an 8-bit signed two's complement integer.
Minimum value is -128 (-2^7)
Maximum value is 127 (inclusive)(2^7 -1)
Short data type is a 16-bit signed two's complement integer.
Minimum value is -32,768 (-2^15)
Maximum value is 32,767 (inclusive) (2^15 -1)
You could also use a linked list if you really are not sure on how many elements are going to be in a large list.
The principal benefit of a linked list over a conventional array is that the list elements can easily be inserted or removed without reallocation or reorganization of the entire structure because the data items need not be stored continuously in memory or on disk, while an array has to be declared in the source code, before compiling and running the program. Linked lists allow insertion and removal of nodes at any point in the list, and can do so with a constant number of operations if the link previous to the link being added or removed is maintained during list traversal.
In general use a linked list for int data type or heavier data types as they actually incur a per-element overhead of at least 1 pointer size, so if your elements are small, you're actually worse off.
See the Oracle docs:
int: By default, the int data type is a 32-bit signed two's complement
integer
So it means that int occupies 4 bytes in memory.
And
byte: The byte data type is an 8-bit signed two's complement integer
So in your case you can change your datatype to byte instead of int.
I was thinking about the following situation: I want to count the occurrence of characters in a string (for example for a permutation check).
One way to do it would be to allocate an array with 256 integers (I assume that the characters are UTF-8), to fill it with zeros and then to go through the string and increment the integers on the array positions corresponding to the int value of the chars.
However, for this approach, you would have to allocate a 256 array each time, even when the analyzed string is very short (and consequently uses only a small part of the array).
An other approach would be to use a Character to Integer HashTable and to store a number for each encountered char. This way, you only would have keys for chars that actually are in the string.
As my understanding of the HashTable is rather theoretic and I do not really know how it is implemented in Java my question is: Which of the two approaches would be more memory efficient?
Edit:
During the discussion of this question (thank you for your answers everyone) I did realize that I had a very fuzzy understanding of the nature of UTF-8. After some searching, I have found this great video that I want to share, in case someone has the same problem.
Ich wonder why you choose 256 as the length of your array when you assume that your String is UTF-8. In UTF-8 a character can be composed of up to 4 bytes which means quite a number of more characters than just 256.
Anyway: Using a HashTable/HashMap needs a huge memory overhead. First all your characters and integer need to be wrapped in an object (Integer/Character). And Integer consumes about 3x as much memory as an int. For arrays the difference can be even larger due to the optimizations java performs on arrays (e.g. the java stack works only in multiples of 4 byte, while in an array java allows smaller types such as a char to consume only 2 bytes).
Then the HashTable itself creates a memory overhead because it needs to maintain an array (which is usually not fully used) and linked lists to maintain all objects which generate the same hash.
Additionally access times will be dramatically faster for arrays. You save multiple method invocations (add, hashCode, iterator,...) and there exist a number of opcode in java byte code to make working with arrays more efficient.
Anyway. You question was:
Which of the two approaches would be more memory efficient?
And it is safe to say that arrays will be more memory efficient.
However you should make absolutely sure what your requirements are. Do you need more memory efficiency? (Could be true if you process large amounts of data or you are on a slow device (mobile devices?)) How important is readability of code? How about size of code? Reuseability?
And ist 256 really the correct size?
Without looking in the code I know that a HashMap requires, at minimum, a base object, a hashtable array, and individual objects for each hash entry. Generally an int value would have to be stored as an Integer object so that's more objects. Let's assume you have 30 unique characters:
32 bytes for the base object
256 bytes for a minimum-size hashtable array
32 bytes for each of the 30 table entries
16 bytes (if highly optimized) for each of 30 Integers
32 + 256 + 960 + 480 = 1728 bytes. That's for a minimal, non-fancy implementation.
The array of 256 ints would be about 1056 bytes.
I would use the array. From a performance aspect, you have guaranteed constant access. Better than the what a hash table can get you.
As it also only uses an constant amount of memory, I see no downside. The HashMap will most likely need more memory, even if you only store a few elements.
By the way, the memory footprint should not be a concern, as you will only need the data structure as long as you need it for counting. Then it will be garbage collected, anyway.
Well here are the facts.
HashMap uses an array for its table behind the scenes.
So if you were actually limited by finding a contiguous space in memory, HashMap's benefit is only that the array may be smaller.
HashMap is generic and therefore uses objects.
Objects take up extra space. As I remember, it's typically 8 or 16 bytes minimum depending on whether it's a 32- or 64-bit system. This means the HashMap may very well not be smaller, even if the number of characters in the String is small. HashMap will require 3 extra objects for each entry: an Entry, a Character and an Integer. HashMap also needs to store the int for the index locally whereas the array does not.
That's beyond that there will be some extra computation using the HashMap.
I would also say space optimization is not something you should worry about here. Either way, the memory footprint is actually very small.
Initialize an array of integers that represent the int value of a char, for example the int value of f is 102 which is its ascii value
http://www.asciitable.com/
char c = 'f';
int x = (int)c;
If you know the range of char's youre dealing with then it is easier.
For each occurance of char increment the index of that char in the array by one. This approach would be slow if you have to iterate and complicated if you are to sort but wont be memory intensive.
Just be aware when you sort you lose the indexes
I see that the maximum size of an array can be only maximum size of an Int. Why does Java not allow an array of size long-Max ?
long no = 10000000000L;
int [] nums = new int[no];//error here
You'll have to address the "why" question to the Java designers. Anyone else can only speculate. My speculation is that they felt that a two-billion-element array ought to be enough for anybody (which, in fairness, it probably is).
An int-sized length allows arrays of 231-1 ("~2 billion") elements. In the gigantically overwhelming majority of arrays' uses, that's plenty.
An array of that many elements will take between 2 gigabytes and 16 gigabytes of memory, depending on the element type. When Java appeared in 1995, new PCs had only around 8 megabytes of RAM. And those 32-bit operating systems, even if they used virtual memory on disk, had a practical limit on the size of a contiguous chunk of memory they could allocate which was quite a bit less than 2 gigabytes, because other allocated things are scattered around in a process's address space. Thus the limits of an int-sized array length were untestable, unforeseeable, and just very far away.
On 32-bit CPUs, arithmetic with ints is much faster than with longs.
Arrays are a basic internal type and they are used numerously. A long-sized length would take an extra 4 bytes per array to store, which in turn could affect packing together of arrays in memory, potentially wasting more bytes between them. (Even though the longer length would almost never be useful.)
If you ever do need in-RAM storage for more than ~2 billion items, you can use an array of arrays.
Unfortunately Java does not support arrays with more than 2^31 elements.
i.e. 16 GiB of space for a long[] array.
try creating this...
Object[] array = new Object[Integer.MAX_VALUE - 4];
you should get OUTOFMEMMORY error...SO the maximum size will be Integer.MAX_VALUE - 5
I try to create an array of Integers (i tried with own object but the same happened with int) , with size of 30 million. i keep getting "OutOfMemoryError: Java heap space"
Integer [] index = new Integer[30000000];
for (int i = 0 ; i < 30000000 ; i++){
index[i] = i;
}
i checked the total heap space, using "Runtime.getRuntime().totalMemory()" and "maxMemory()"
and saw that i start with 64 MB and the max is 900+ MB, and during the run i get to 900+ on the heap and crush.
now i know that Integer takes 4 bytes, so even if i multiply 30*4*1000000 i should still only get about 150-100 mega.
if i try with a primitive type, like int, it works.
how could i fix it ?
Java's int primitive will take up 4 bytes but if you use a ValueObject like Integer it's going to take up much more space. Depending on your machine a reference alone could take up 32 or 64 bits + the size of the primitive it is wrapping.
You should probably just use primitive ints if space is an issue. Here is a very good SO answer that explains this topic in more detail.
Lets assume that we are talking about a 32bit OpenJDK-based JVM.
Each Integer object has 1 int field - occupying 4 bytes.
Each Integer object has 2 header words - occupying 8 bytes.
The granularity of allocation is (I believe) 2 words - 4 bytes of padding.
The Integer[] has 1 reference for each array element / position - 4 bytes.
So the total is 20 bytes per array element. 20 x 30 x 1,000,000 = 600,000,000 Mbytes. Now add the fact that the generational collector will allocate at least 3 object spaces of various sizes, and that could easily add up to 900+ Mbytes.
how could i fix it ?
Use int[] instead of Integer.
If the Integer values mostly represent numbers in the range -128 to + 127, allocate them with Integer.valueOf(int). The JLS guarantees that Integer objects created that way will be shared. (Note that when an Integer is created by auto-boxing, then JLS stipulates that valueOf is used. So, in fact, this "fix" has already been applied in your example.)
If your Integer values mostly come from a larger but still small domain, consider implementing your own cache for sharing Integer objects.
My question was about Integer as an example, in my program i use my own object that only holds an array of bytes (max size of 4). when i create it, it takes a lot more then 4 bytes on the memory.
Yes, it will do.
Let's assume your class is defined like this:
public class MyInt {
private byte[] bytes = new byte[4];
}
Each MyInt will occupy:
MyInt header words - 8 bytes
MyInt.bytes field - 4 byte
Padding - 4 bytes
Header words for the byte array - 12 bytes
Array content - 4 bytes
Now add the space taken by the MyInt reference:
Reference to each MyInt - 4 bytes
Grand total - 36 bytes per MyInt element of a MyInt[].
Compare that with 20 bytes per Integer element of an Integer[] or 4 bytes per int element of an int[].
How to fix that case?
Well an array of 4 bytes contains 32 bits of data. That can be encoded as int. So the fix is the same as before. Use an int[] instead of a MyInt[], or (possibly) adapt one of the other ideas discussed above.
Alternatively, make the heap larger, or use a database or something like that so that the data doesn't need to be held in RAM.
Integer is an object which will take more than 4 bytes. How much more is implementation dependent. Do you really need Integer? The only benefit is that it can be null. Perhaps you could use a "sentinal value" instead; say, -1, or Integer.MIN_VALUE.
Perhaps you should be using a database rather than a huge array, but if you must use a huge array of objects, have you tried increasing the Java memory size by using a the -Xms command line argument when running the Java application launcher?
This is not what you are looking for but the optimal solution is to use a function instead of an array in this simple example.
static int index(int num) {
return num;
}
If you have a more realistic example, there may be other optimisations you can use.