I have to store millions of X/Y double pairs for reference in my Java program. I'd like to keep memory consumption as low as possible as well as the number of object references. So after some thinking I decided holding the two points in a tiny double array might be a good idea, it's setup looks like so:
double[] node = new double[2];
node[0] = x;
node[1] = y;
I figured using the array would prevent the link between the class and my X and Y variables used in a class, as follows:
class Node {
public double x, y;
}
However after reading into the way public fields in classes are stored, it dawned on me that fields may not actually be structured as pointer like structures, perhaps the JVM is simply storing these values in contiguous memory and knows how to find them without an address thus making the class representation of my point smaller than the array.
So the question is, which has a smaller memory footprint? And why?
I'm particularly interested in whether or not class fields use a pointer, and thus have a 32-bit overhead, or not.
The latter has the smaller footprint.
Primitive types are stored inline in the containing class. So your Node requires one object header and two 64-bit slots. The array you specify uses one array header (>= an object header) plust two 64-bit slots.
If you're going to allocate 100 variables this way, then it doesn't matter so much, as it is just the header sizes which are different.
Caveat: all of this is somewhat speculative as you did not specify the JVM - some of these details may vary by JVM.
I don't think your biggest problem is going to be storing the data, I think it's going to be retrieving, indexing, and manipulating it.
However, an array, fundamentally, is the way to go. If you want to save on pointers, use a one dimensional array. (Someone has already said that).
First, it must be stated that the actual space usage depends on the JVM you are using. It is strictly implementation specific. The following is for a typical mainstream JVM.
So the question is, which has a smaller memory footprint? And why?
The 2nd version is smaller. An array has the overhead of the 32 bit field in the object header that holds the array's length. In the case of a non-array object, the size is implicit in the class and does not need to be represented separately.
But note that this is a fixed over head per array object. The larger the array is, the less important the overhead is in practical terms. And the flipside of using a class rather than array is that indexing won't work and your code may be more complicated (and slower) as a result.
A Java 2D array is actually and array of 1D arrays (etcetera), so you can apply the same analysis to arrays with higher dimensionality. The larger the size an array has in any dimension, the less impact the overhead has. The overhead in a 2x10 array will be less than in a 10x2 array. (Think it through ... 1 array of length 2 + 2 of length 10 versus 1 array of length 10 + 10 of length 2. The overhead is proportional to the number of arrays.)
I'm particularly interested in whether or not class fields use a pointer, and thus have a 32-bit overhead, or not.
(You are actually talking about instance fields, not class fields. These fields are not static ...)
Fields whose type is a primitive type are stored directly in the heap node of the object without any references. There is no pointer overhead in this case.
However, if the field types were wrapper types (e.g. Double rather than double) then there could be the overhead of a reference AND the overheads of the object header for the Double object.
Related
Sorry if this is a really stupid question, but hearing as "Java arrays are literally just Objects" it makes no sense to me that they need to have a pre-defined length?
I understand why primitive types do, for example int myInt = 15; allocates 32 bits of memory to store an integer and that makes sense to me. But if I had the following code:
class Integer{
int myValue;
public Integer(int myValue){
this.myValue = myValue;
}
}
Followed by a Integer myInteger = new Integer(15);myInteger.myValue = 5; then there's no limit on the amount of data I can store in myInteger. It's not limited to 32 bits, but rather it's a pointer to an Object which can store any amount of ints, doubles, Strings, or really anything. It allocated 32 bits of memory to store the pointer, but the object itself can store any amount of data, and it doesn't need to be specified beforehand.
So why can't an array do that? Why do I need to tell an array how much memory to allocate beforehand? If an array is "literally just an object" then why can't I simply say String[] myStrings = new String[];myStrings[0] = "Something";?
I'm super new to Java so there's a 100% chance that this is a stupid question and that there's a very simple and clear answer, but I am curious.
Also, to give another example, I can say ArrayList<String> myStrings = new ArrayList<String>();myStrings.add("Something"); without any problem... So what makes an ArrayList different from an array? Why does an array NEED to be told how much memory to allocate when an ArrayList doesn't?
Thanks in advance to anybody who takes the time to fill me in. :)
EDIT: Okay, so far everybody in the comments have misunderstood my post and I feel like it's my fault for wording it wrong.
My question is not "how do I define an array?", or "does changing the value of a variable change its memory usage?", or "do pointers store the data of the object they point to?", or "are arrays objects?", nor is it "how to ArrayLists work?"
My question is, how come when I make an array I need to tell it how big the object it points to is, but when I make any other object it scales on its own without me telling it anything upfront? (With ArrayLists being an example of the difference)
I hope this makes more sense now... I'm not sure why everybody misunderstood? (Did I word something wrong? If so, let me know and I'll change it for others' convenience)
My question is why does a pointer to an array need to know how big the array is beforehand, when a pointer to any other object doesn't?
It doesn't. Here, this runs perfectly fine:
String[] x = new String[10];
x = new String[15];
The whole 'needs to know in advance how large it is' refers only to the ARRAY OBJECT. As in, new int[10] goes to the heap, which is like a giant beach, and creates a new treasure chest out of thin air, big enough to hold exactly 10 ints (Which, being primitives, are like coins in this example). It then buries it in the sand, lost forever. Hence why new int[10]; all by its lonesome is quite useless.
When you write int[] arr = new int[10];, you still do that, but you now also make a treasure map. X marks the spot. 'arr' is this map. It is NOT AN INT ARRAY. It is a map to an int array. In java, both [] and . are 'follow the map, dig down, and open er up'.
arr[5] = 10; means: Follow your arr map, dig down, open up the chest you find there, and you'll see it has room for precisely 10 little pouches, each pouch large enough to hold one coin. Take the 6th pouch. Remove whatever was there, put a 10ct coin in.
It's not the map that needs to know how large the chest is that the map leads to. It's the chest itself. And this is true for objects as well, it is not possible in java to make a treasure chest that can arbitrarily resize itself.
So how does ArrayList work?
Maps-in-boxes.
ArrayList has, internally, a field of type Object[]. That field doesn't hold an object array. It can't. It holds a map to an object array: It's a reference.
So, what happens when you make a new arraylist? It is a treasure chest, fixed size, with room for exactly 2 things:
A map to an 'object array' treasure chest (which it will also make, with room for 10 maps, and buries it in the sand, and stores the map to this chest-of-maps inside itself.
A coinpouch. The coin inside represents how many objects the list actually contains. The map to the treasure it has leads to a treasure with room for 10 maps, but this coin (value: 0) says that so far, none of those maps go anywhere.
If you then run list.add("foo"), what that does is complicated:
"foo" is an object (i.e. treasure), so "foo" as an expression resolves to be a map to "foo". It then takes your list treasuremap, follows it, digs down, opens the box, and you yell 'oi! ADD THIS!', handing it a copy of your treasuremap to the "foo" treasure. What the box then does with this is opaque to you - that's the point of OO.
But let's dig into the sources of arraylist: What it will do, is query its treasuremap to the object array (which is private, you can't get to it, it's in a hidden compartment that only the djinn that lives in the treasure chest can open), follows it, digs down, and goes to the first slot (why? Because the 'size coin' in the coinpouch is currently at 0). It takes the map-to-nowhere that is there, tosses it out, makes a copy of your map to the "foo" treasure, and puts the copy in there. It then replaces its coin in the coin pouch with a penny, to indicate it is now size 1.
If you add an 11th element, the ArrayList djinn goes out to the other treasure, notices there is no room, and goes: Well, dang. Okay. It then conjures up an entirely new treasure chest that can hold 15 treasure maps, it copies over the 10 maps in the old treasure, moves them to the new treasurechest, adds the copy of the map of the thing you added as 11th, then goes back to its own chest, rips out the map to the real treasure and replaces it to a map of the newly made treasure (With 15 slots), and puts an 11ct coin in the pouch.
The old treasure chest remains exactly where it is. If nobody has any maps to this (and nobody does), eventually, the beachcomber finds it, gets rid of it (that'd be the garbage collector).
Thus, ALL treasure chests are fixed size, but by replacing maps with new maps and conjuring up new treasure chests, you can nevertheless make it look like ArrayList is capable of shrinking and growing.
So why don't arrays allow it? Because that shrinking and growing stuff is complicated and arrays expose low-level functionality. Don't use arrays, use Lists.
You seem to misunderstand what "storage" means. You say "there's no limit on the amount of data I can store", but if you run myInteger.myValue = 15, you overwrite the value of 32 that you put there originally. You still can't store any more than 32 bits, it's simply that you can change which 32 bits you put in that variable.
If you want to see how ArrayList works, you can read the source code; it can expand because if it runs out of space it creates a new larger array and switches its single array variable elementData to it.
Based on your update, it seems like you may be wondering about the ability to add lots of different fields to your object definition. In this case, those fields and their types are fixed when the class is compiled, and from that point on the class has a fixed size. You can't just pile in extra properties at runtime like you can in JavaScript. You are telling it up front about the scale it needs.
I'm going to ignore most of the details you've given, and answer the question in your edit.
My question is, how come when I make an array I need to tell it how big the object it points to is, but when I make any other object it scales on its own without me telling it anything upfront?
It's worth starting by dealing with "when I make any other object it scales on its own", because this isn't true. If you create a class like this:
class MyInteger
public int value;
public MyInteger(int value) {
this.value = value;
}
}
Then that class has a statically defined size. Once you've compiled this class, the amount of memory for an instance of MyInteger is already determined. In this case, it's the object header size (JVM dependent), and the size of an integer (at least 4 bytes).
Once an object has been allocated by the JVM, its size cannot change. It is treated as a block of bytes by the JVM (and importantly, the garbage collector) until it is reclaimed. Classes like ArrayList give the illusion of growing, but they actually work by allocating other objects, which they store references to.
class MyArrayList {
public int[] values;
public MyArrayList(int[] values) {
this.values = values;
}
}
In this case, the MyArrayList instance will always take the same amount of memory (object header size + reference size), but the array that is referenced may change. We could do something like this:
MyArrayList list = new MyArrayList(new int[50]);
This allocates a block of memory for list, and a block of memory for list.values. If we then do (as ArrayList effectively does internally):
list.values = new int[500];
then the memory allocated for list is still the same size, but we have allocated a new block which we then reference in list.values. This leaves our old int[50] with no references (so it can be garbage collected). Importantly, though, no allocation has changed size. We have reallocated a new, bigger, block for our list to use, and have referenced it from our MyArrayList instance.
Why do arrays in Java need to have a pre-defined length when Objects don't?
In order to understand this, we need to establish that "size" is a complicated concept in Java. There are a variety of meanings:
Each object is stored in the heap as one or more heap nodes, where one of these is the primary node, and the rest are component objects that can be reached from the primary node.
The primary heap node is represented by a fixed and unchanging number of bytes of heap memory. I will call this1 the native size of the object.
An array has an explicit length field. This field is not declared. It has a type int and cannot be assigned to. There is actually a 32 bit field in the header of each array instance that holds the length.
The length of an array directly maps to its native size. The JVM can compute the native size from the length.
An object that is not an array instance also has a native size. This is determined by the number and types of the object's fields. Since fields cannot be added or removed at runtime, the native size does not change. But it doesn't need to be stored since it can be determined (when needed) at runtime from the object's class.
Some objects support a class specific size concept. For example, a String has a size returned by its length() method, and an ArrayList has a size returned by its size() method.
NB:
The meaning of the class specific size is ... class specific.
The class specific size does not correlate to the native size of an instance. (Except in degenerate cases ...)
In fact, all objects have a fixed native size.
1 - This term is solely for the purposes of this answer. I claim no authority for this term ...
Examples:
A String[] has a native size that depends on its length. On a typical JVM it will be 12 + length * (<reference size>) rounded up to a multiple of 16 bytes.
Your Integer class has a fixed native size. On a typical JVM each instance will be 16 bytes long.
An ArrayList object has 2 private int fields and a private Object[] field. That gives it a fixed native size of either 16 or 24 bytes. One of the int fields is call size, and it contains the value returned by size().
The size of an ArrayList may change, but this is implemented by the code of the class. In order to do this, it may need to reallocate its internal Object[] to make it large enough to hold more elements. If you examine the source code for the ArrayList class, you can see how this happens. (Look for the ensureCapacity and grow methods.)
So, the differences between the size(s) of regular object and the length of an array are:
The natural size of a regular object is determined solely by the type of the object, it never changes. It is rarely relevant to the application and it is not exposed via a field.
The length of an array depends on value supplied when you instantiate it. It never changes. The natural size can be determined from the length.
The class specific size of an object (if relevant) is managed by the class.
To your revised question:
My question is, how come when I make an array I need to tell it how big the object it points to is, but when I make any other object it scales on its own without me telling it anything upfront? (With ArrayLists being an example of the difference)
The point is that at the JVM level, NOTHING scales automatically. The native size of a Java object CANNOT change.
Why? Because increasing the size of the object's heap node would entail moving the heap node, and a heap node cannot be moved without updating all references for the object. That cannot be done efficiently.
(It has been pointed out that the GC can efficiently move heap nodes. However, that is not a viable solution. Running the GC is expensive. It would be highly inefficient to perform a GC in order to (say) grow a single Java array. If Java had been specified so that arrays could "grow", it would need to implemented using an underlying non-growable array type.)
The ArrayList case is being handled by the ArrayList class itself, and it does it by (if necessary) creating a new, larger backing array, copying the elements from the old to the new, and then discarding the old backing array. It also adjusts the size field that hold the logical size of the list.
Object arrays allocate space for object pointers, and not entire objects in memory.
So new String[10] doesnt allocate space for 10 strings, but for 10 object references that would be point to what strings are stored in the array.
Array DS requires all its members to have the same time. Java throws ArrayStoreException when an attempt has been made to store the wrong type of object into an array of objects. Don't remember what C++ does.
Is my understanding correct that it's important to have all objects of the same type in array because it guarantees constant time element access through the following two operations:
1) element size * element index = offset
2) array pointer address + offset
If objects are of different types and consequently different size, the above mentioned formula won't work.
Because: we want it like this.
What I mean is: people using the Java language (probably the same for C++) are using a statically typed language for a purpose.
And when such people starting thinking in plurals; they typically think in plurals of "similar" things.
Caveat: in Java, everything is an Object, so you can always declare an Object[] and stuff anything into that. Strings, Numbers, whatever.
And that also leads to the other important aspect: in C++, your array represents an area in memory. And you better have same sized elements in that area; to avoid data corruption.
In Java on the other hand, an array is not pointing to raw memory.
Long story short: there are real differences between Java and C++ in this context (that one has to understand to make an informed decision); and then there is the "language" thing itself. In other words: this is not Ruby land, where you just put ducks, numbers, plants and quack sounds in the same "list" without further thinking.
Final thought, based on that joke in the last paragraph: in my eyes, an array is an implementation of the list concepts, thus it is about a collection of things of the same nature. If you want a collection of unrelated things, I would rather call that a tuple.
Yes, you are right. All that is required for constant time random access.
Also, you can have an array of void pointers if you want to store different data types in a single array. For instance in c++, do
void * a[N]
a[i] = (void *)(&YourClass)
Similarly, use Object[] in java.
The C++ language (and compiler) requires the type of the elements to be stored in an array for various reasons, like pointer arithmetics and array subscription (e.g. x[i]), default initialisation of the elements, dealing with alignment restrictions, ...).
int x[3] = { 1,2,3 }; // array of 3 int values, each being properly aligned concerning processor architecture;
myObjectType objs[10]; // array of 10 objects of type myObjectType, each being default initialised (probably the default constructor), each being properly aligned
myObjectType *objs[10]; // array of 10 pointers to objects of type myObjectType (including subclasses of myObjectType; allowing dynamic binding and polymorphism). Note: all pointers have the same size, the object to which they point may differ insize.
int *intptr = x;
bool isEqual= (intptr[2] == x[2]); // gives true
intptr += 2; // increases the pointer by "2*sizeof(int)" bytes.
So, yes, you are right: one reason is because of calculating offsets; But there are other reasons as well, and other issues like alignment, array to pointer decay, default initialisation logic are probably more subtle but essential, too.
Since day one of learning Java I've been told by various websites and many teachers that arrays are consecutive memory locations which can store the specified number of data all of the same type.
Since an array is an object and object references are stored on the stack, and actual objects live in the heap, object references point to actual objects.
But when I came across examples of how arrays are created in memory, they always show something like this:
(In which a reference to an array object is stored on the stack and that reference points to the actual object in the heap, where there are also explicit indexes pointing to specific memory locations)
But recently I came across online notes of Java in which they stated that arrays' explicit indexes are not specified in the memory. The compiler just knows where to go by looking at the provided array index number during runtime.
Just like this:
After reading the notes, I also searched on Google regarding this matter, but the contents on this issue were either quite ambiguous or non-existent.
I need more clarification on this matter. Are array object indexes explicitly shown in memory or not? If not, then how does Java manage the commands to go to a particular location in an array during runtime?
Does an array object explicitly contain the indexes?
Short answer: No.
Longer answer: Typically not, but it theoretically could do.
Full answer:
Neither the Java Language Specification nor the Java Virtual Machine Specification makes any guarantees about how arrays are implemented internally. All it requires is that array elements are accessed by an int index number having a value from 0 to length-1. How an implementation actually fetches or stores the values of those indexed elements is a detail private to the implementation.
A perfectly conformant JVM could use a hash table to implement arrays. In that case, the elements would be non-consecutive, scattered around memory, and it would need to record the indexes of elements, to know what they are. Or it could send messages to a man on the moon who writes the array values down on labeled pieces of paper and stores them in lots of little filing cabinets. I can't see why a JVM would want to do these things, but it could.
What will happen in practice? A typical JVM will allocate the storage for array elements as a flat, contiguous chunk of memory. Locating a particular element is trivial: multiply the fixed memory size of each element by the index of the wanted element and add that to the memory address of the start of the array: (index * elementSize) + startOfArray. This means that the array storage consists of nothing but raw element values, consecutively, ordered by index. There is no purpose to also storing the index value with each element, because the element's address in memory implies its index, and vice-versa. However, I don't think the diagram you show was trying to say that it explicitly stored the indexes. The diagram is simply labeling the elements on the diagram so you know what they are.
The technique of using contiguous storage and calculating the address of an element by formula is simple and extremely quick. It also has very little memory overhead, assuming programs allocate their arrays only as big as they really need. Programs depend on and expect the particular performance characteristics of arrays, so a JVM that did something weird with array storage would probably perform poorly and be unpopular. So practical JVMs will be constrained to implement contiguous storage, or something that performs similarly.
I can think of only a couple of variations on that scheme that would ever be useful:
Stack-allocated or register-allocated arrays: During optimization, a JVM might determine through escape analysis that an array is only used within one method, and if the array is also a smallish fixed size, it would then be an ideal candidate object for being allocated directly on the stack, calculating the address of elements relative to the stack pointer. If the array is extremely small (fixed size of maybe up to 4 elements), a JVM could go even further and store the elements directly in CPU registers, with all element accesses unrolled & hardcoded.
Packed boolean arrays: The smallest directly addressable unit of memory on a computer is typically an 8-bit byte. That means if a JVM uses a byte for each boolean element, then boolean arrays waste 7 out of every 8 bits. It would use only 1 bit per element if booleans were packed together in memory. This packing isn't done typically because extracting individual bits of bytes is slower, and it needs special consideration to be safe with multithreading. However, packed boolean arrays might make perfect sense in some memory-constrained embedded devices.
Still, neither of those variations requires every element to store its own index.
I want to address a few other details you mentioned:
arrays store the specified number of data all of the same type
Correct.
The fact that all an array's elements are the same type is important because it means all the elements are the same size in memory. That's what allows for elements to be located by simply multiplying by their common size.
This is still technically true if the array element type is a reference type. Although in that case, the value of each element is not the object itself (which could be of varying size) but only an address which refers to an object. Also, in that case, the actual runtime type of objects referred to by each element of the array could be any subclass of the element type. E.g.,
Object[] a = new Object[4]; // array whose element type is Object
// element 0 is a reference to a String (which is a subclass of Object)
a[0] = "foo";
// element 1 is a reference to a Double (which is a subclass of Object)
a[1] = 123.45;
// element 2 is the value null (no object! although null is still assignable to Object type)
a[2] = null;
// element 3 is a reference to another array (all arrays classes are subclasses of Object)
a[3] = new int[] { 2, 3, 5, 7, 11 };
arrays are consecutive memory locations
As discussed above, this doesn't have to be true, although it is almost surely true in practice.
To go further, note that although the JVM might allocate a contiguous chunk of memory from the operating system, that doesn't mean it ends up being contiguous in physical RAM. The OS can give programs a virtual address space that behaves as if contiguous, but with individual pages of memory scattered in various places, including physical RAM, swap files on disk, or regenerated as needed if their contents are known to be currently blank. Even to the extent that pages of the virtual memory space are resident in physical RAM, they could be arranged in physical RAM in an arbitrary order, with complex page tables that define the mapping from virtual to physical addresses. And even if the OS thinks it is dealing with "physical RAM", it still could be running in an emulator. There can be layers upon layers upon layers, is my point, and getting to the bottom of them all to find out what's really going on takes a while!
Part of the purpose of programming language specifications is to separate the apparent behavior from the implementation details. When programming you can often program to the specification alone, free from worrying about how it happens internally. The implementation details become relevant however, when you need to deal with the the real-world constraints of limited speed and memory.
Since an array is an object and object references are stored on the stack, and actual objects live in the heap, object references point to actual objects
This is correct, except what you said about the stack. Object references can be stored on the stack (as local variables), but they can also be stored as static fields or instance fields, or as array elements as seen in the example above.
Also, as I mentioned earlier, clever implementations can sometimes allocate objects directly on the stack or in CPU registers as an optimization, although this has zero effect on your program's apparent behavior, only its performance.
The compiler just knows where to go by looking at the provided array index number during runtime.
In Java, it's not the compiler that does this, but the virtual machine. Arrays are a feature of the JVM itself, so the compiler can translate your source code that uses arrays simply to bytecode that uses arrays. Then it's the JVM's job to decide how to implement arrays, and the compiler neither knows nor cares how they work.
In Java, arrays are objects. See the JLS - Chapter 10. Arrays:
In the Java programming language, arrays are objects (§4.3.1), are dynamically created, and may be assigned to variables of type Object (§4.3.2). All methods of class Object may be invoked on an array.
If you look at 10.7. Array Members chapter, you'll see that the index is not part of the array member:
The members of an array type are all of the following:
The public final field length, which contains the number of components
of the array. length may be positive or zero.
The public method clone, which overrides the method of the same name
in class Object and throws no checked exceptions. The return type of
the clone method of an array type T[] is T[].
All the members inherited from class Object; the only method of Object
that is not inherited is its clone method.
Since the size of each type is known, you can easily determine the location of each component of the array, given the first one.
The complexity of accessing an element is O(1) since it only needs to calculate the address offset. It's worth mentioning that this behavior is not assumed for all programming languages.
The array, as you say, will only store objects of the same type. Each type will have a corresponding size, in bytes. For example in an int[] each element will occupy 4 bytes, each byte in a byte[] will occupy 1 byte, each Object in an Object[] will occupy 1 word (because it's really a pointer to the heap), etc.
The important thing is that each type has a size and every array has a type.
Then, we get to the problem of mapping an index to a memory position at runtime. It's actually very easy because you know where the array starts and, given the array's type, you know the size of each element.
If your array starts at some memory position N you can use the the given index I and element size S to compute that the memory you're looking for will be at memory address N + (S * I).
This is how Java finds memory positions for indexes at runtime without storing them.
Your two diagrams are, apart from labels that are strictly for human consumption, equivalent and identical.
That is to say that in the first diagram, the labels arr[0], arr[1], etc., are not part of the array. They are simply there for illustrative purposes, indicating how the array elements are laid out in memory.
What you were told, namely that arrays are stored in contiguous locations in memory (at least insofar as virtual addresses are concerned; on modern hardware architectures, these need not map into contiguous physical addresses) and array elements are located based on their size and index, is correct. (At least in... well, it is definitely correct in C/C++. It is almost certainly correct in most, if not all, Java implementations. But it is likely incorrect in languages that allow for sparse arrays or arrays that can grow or shrink dynamically.)
The fact that the array reference is created in the stack whereas the array data are placed on the heap is an implementation-specific detail. Compilers that compile Java directly to machine code may implement array storage differently, taking into account the specific characteristics of the target hardware platform. In fact, a clever compiler may place, e.g., small arrays in the stack in their entirety, and use the heap only for larger arrays, to minimize the need for garbage collection, which can impact performance.
On your first picture arr[0] to arr[4] are not references to the array elements. They are just illustrative labels for the location.
The critical piece to understand is that memory allocated for an array is contiguous. So given the address of the initial element of an array, i.e., arr[0], this contiguous memory allocation scheme helps the runtime to determine the address of array element given its index.
Say we have declared int[] arr = new int[5], and its initial array element, arr[0], is at address 100. To reach the third element in the array all that the runtime needs to perform is following the math 100 + ((3-1)*32) = 164 (assuming 32 is the size of an integer). So all that the runtime needs is the address of the initial element of that array. It can derive all other addresses of array elements based on the index and the size of the datatype the array stores.
Just an off-topic note: Although the array occupies a contiguous memory location, the addresses are contiguous only in the virtual address space and not in the physical address space. A huge array could span multiple physical pages that may not be contiguous, but the virtual address used by the array will be contiguous. And mapping of a virtual address to a physical address is done by OS page tables.
The reference of an array is not always on the stack. It could also be stored on the heap if it's a member of a class.
The array itself can hold either primitive values or references to an object. In any case, the data of an array are always of the same kind. Then the compiler can deal with their location without explicit pointers, only with the value/reference size and an index.
See:
* The Java Language Specification, Java SE 8 Edition - Arrays
* The Java Virtual Machine Specification, Java SE 8 Edition - Reference Types and Values
The "consecutive memory locations" is an implementation detail and may be wrong. For example, Objective-C mutable arrays do not use consecutive memory locations.
To you, it mostly doesn't matter. All you need to know is that you can access an array element by supplying the array and an index, and some mechanism unknown to you uses the array and the index to produce the array element.
There is obviously no need for the array to store indexes, since for example every array in the world with five array elements has the indexes 0, 1, 2, 3, and 4. We know these are the indexes, no need to store them.
An array is a contigious memory allocation, which means if you know the address of the first element you can go to the next index by stepping to next memory address.
The reference array is not the array address, but the way to reach the address (done internally) like normal objects. So you can say you have the position from where the array starts, and you can move a memory address by changing the indexes. So this is why indexes are not specified in the memory; the compiler just knows where to go.
Out of interest: Recently, I encountered a situation in one of my Java projects where I could store some data either in a two-dimensional array or make a dedicated class for it whose instances I would put into a one-dimensional array. So I wonder whether there exist some canonical design advice on this topic in terms of performance (runtime, memory consumption)?
Without regard of design patterns (extremely simplified situation), let's say I could store data like
class MyContainer {
public double a;
public double b;
...
}
and then
MyContainer[] myArray = new MyContainer[10000];
for(int i = myArray.length; (--i) >= 0;) {
myArray[i] = new MyContainer();
}
...
versus
double[][] myData = new double[10000][2];
...
I somehow think that the array-based approach should be more compact (memory) and faster (access). Then again, maybe it is not, arrays are objects too and array access needs to check indexes while object member access does not.(?) The allocation of the object array would probably(?) take longer, as I need to iteratively create the instances and my code would be bigger due to the additional class.
Thus, I wonder whether the designs of the common JVMs provide advantages for one approach over the other, in terms of access speed and memory consumption?
Many thanks.
Then again, maybe it is not, arrays are objects too
That's right. So I think this approach will not buy you anything.
If you want to go down that route, you could flatten this out into a one-dimensional array (each of your "objects" then takes two slots). That would give you immediate access to all fields in all objects, without having to follow pointers, and the whole thing is just one big memory allocation: since your component type is primitive, there is just one object as far as memory allocation is concerned (the container array itself).
This is one of the motivations for people wanting to have structs and value types in Java, and similar considerations drive the development of specialized high-performance data structure libraries (that get rid of unneccessary object wrappers).
I would not worry about it, until you really have a huge datastructure, though. Only then will the overhead of the object-oriented way matter.
I somehow think that the array-based approach should be more compact (memory) and faster (access)
It won't. You can easily confirm this by using Java Management interfaces:
com.sun.management.ThreadMXBean b = (com.sun.management.ThreadMXBean) ManagementFactory.getThreadMXBean();
long selfId = Thread.currentThread().getId();
long memoryBefore = b.getThreadAllocatedBytes(selfId);
// <-- Put measured code here
long memoryAfter = b.getThreadAllocatedBytes(selfId);
System.out.println(memoryAfter - memoryBefore);
Under measured code put new double[0] and new Object() and you will see that those allocations will require exactly the same amount of memory.
It might be that the JVM/JIT treats arrays in a special way which could make them faster to access in one way or another.
JIT do some vectorization of an array operations if for-loops. But it's more about speed of arithmetic operations rather than speed of access. Beside that, can't think about any.
The canonical advice that I've seen in this situation is that premature optimisation is the root of all evil. Following that means that you should stick with the code that is easiest to write / maintain / get past your code quality regime, and then look at optimisation if you have a measurable performance issue.
In your examples the memory consumption is similar because in the object case you have 10,000 references plus two doubles per reference, and in the 2D array case you have 10,000 references (the first dimension) to little arrays containing two doubles each. So both are one base reference plus 10,000 references plus 20,000 doubles.
A more efficient representation would be two arrays, where you'd have two base references plus 20,000 doubles.
double[] a = new double[10000];
double[] b = new double[10000];
I am currently writing an API for reading an OBJ file. In this API, i have a List of Vectors, and a class Describing a Face(3 vectors).
I want to think about memory usage, so i wonder if it is smartest for the face to remember the index of its vectors in the vector array, or if it should just have a pointer/instance of the vectors.
Also, would the same count in C#?
An integer (which your index would be) is 32 bits, a reference is either 32 or 64 bits an object is a minimum of 64 bits plus its internals. So an integer is either the same size or slightly smaller than a reference. But a copy of the object will be much larger
Index of array or reference
But seriously you shouldn't be worrying about this index to an array of references unless there are an insane number of these and you are memory starved. And of course there will be a small performance penalty for this indirection. Do whichever makes most sense conceptually but its likely you want to stick with references not indexes to an array if both make conceptual sense - very similar memory footprint and a simpler structure.
Copy of vector
Of course the third option is to keep a reference to a copy of the vector, this would involve the full memory of the object for each copy and is worth avoiding unless you need independent objects