Reference to element in ArrayList which is later moved - java

assume the following code:
ArrayList<A> aList = new ArrayList<>();
for(int i = 0; i < 1000; ++i)
aList.add(new A());
A anElement = aList.get(500);
for(int i = 0; i < 100000; ++i)
aList.add(new A());
Afterwards anElement still correctly references aList[500], even though the ArrayList presumably reallocated its data multiple times during the second for loop. Is this assumption incorrect, and if not, how does Java manage to have anElement still point at the correct data in memory?
My theories are that either instead of freeing the memory anElement references, that memory now points to the current aList data, or alternatively the reference anElement has is updated when growing the array. Both of these theories however have really bad Space/Time Performance implication, so I consider them unlikely.
Edit:
I misunderstood how arrays store elements, I assumed they store them directly, but in reality they store references, meaning that anElement and aList[500] both point to some object on the heap, solving the problem I failed to understand!

When array that internally stores elements of an ArrayList becomes full, new, larger array is being created and all elements from the previous array are being copied to new one at the same indexes and now there is a space for new elements. Garbage collector will get rid of previous, not needed any more array.
You might want to have a look at the code of implementation of ArrayList here to see details of how it works "under the hood".
Second loop in your code, added next 100000 elements after the 1000th element so now you have 101000 elements in aList, first 1000 elements weren't moved anywhere. With get() method you only read that element, nothing is moved nor deleted from that ArrayList.
Note that ArrayList doesn't really work like array (e.g. array of As is A[]) and it's not a fixed-size collection - ArrayList changes its size when adding or removing elements - e. g. if you remove element at index 0 (aList.remove(0);), then element that was stored at index 1000 is now stored at index 999 and size of ArrayList also changes from 1000 to 999.

If you want to know how an ArrayList works internally just look at the sources you can find online, for example here:
http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/ArrayList.java
Here you can clearly see that an Object[] is used internally that is resized to int newCapacity = (oldCapacity * 3)/2 + 1; when deemed neccessary.
The indexes stay the same, as long as you add something to the back. Should you insert something in the middle all indexes of elements behind this are incremented by one.

That is not Java, but an implementation detail of the given JVM. You can read something here: https://www.artima.com/insidejvm/ed2/jvm6.html for example, so there are complete books about JVM internals.
A simple approach is to have a map of objects, so you refer that map first, and then find the current location of the actual object. Then it is simple to move the actual object around, as its address is stored in a single copy, in that map.
One could say this is slow and store address directly, and go without the extra step of looking up the object in a map. This way moving the object around will require more work, though still possible, just all pointers have to be updated (as raw pointers do not appear on the language level, casting to numbers temporarily or storing inside some random array and similar magic can not do harm here, you can always track that something is a pointer pointing the object you are moving around or not).

Related

Why do arrays in Java need to have a pre-defined length when Objects don't?

Sorry if this is a really stupid question, but hearing as "Java arrays are literally just Objects" it makes no sense to me that they need to have a pre-defined length?
I understand why primitive types do, for example int myInt = 15; allocates 32 bits of memory to store an integer and that makes sense to me. But if I had the following code:
class Integer{
int myValue;
public Integer(int myValue){
this.myValue = myValue;
}
}
Followed by a Integer myInteger = new Integer(15);myInteger.myValue = 5; then there's no limit on the amount of data I can store in myInteger. It's not limited to 32 bits, but rather it's a pointer to an Object which can store any amount of ints, doubles, Strings, or really anything. It allocated 32 bits of memory to store the pointer, but the object itself can store any amount of data, and it doesn't need to be specified beforehand.
So why can't an array do that? Why do I need to tell an array how much memory to allocate beforehand? If an array is "literally just an object" then why can't I simply say String[] myStrings = new String[];myStrings[0] = "Something";?
I'm super new to Java so there's a 100% chance that this is a stupid question and that there's a very simple and clear answer, but I am curious.
Also, to give another example, I can say ArrayList<String> myStrings = new ArrayList<String>();myStrings.add("Something"); without any problem... So what makes an ArrayList different from an array? Why does an array NEED to be told how much memory to allocate when an ArrayList doesn't?
Thanks in advance to anybody who takes the time to fill me in. :)
EDIT: Okay, so far everybody in the comments have misunderstood my post and I feel like it's my fault for wording it wrong.
My question is not "how do I define an array?", or "does changing the value of a variable change its memory usage?", or "do pointers store the data of the object they point to?", or "are arrays objects?", nor is it "how to ArrayLists work?"
My question is, how come when I make an array I need to tell it how big the object it points to is, but when I make any other object it scales on its own without me telling it anything upfront? (With ArrayLists being an example of the difference)
I hope this makes more sense now... I'm not sure why everybody misunderstood? (Did I word something wrong? If so, let me know and I'll change it for others' convenience)
My question is why does a pointer to an array need to know how big the array is beforehand, when a pointer to any other object doesn't?
It doesn't. Here, this runs perfectly fine:
String[] x = new String[10];
x = new String[15];
The whole 'needs to know in advance how large it is' refers only to the ARRAY OBJECT. As in, new int[10] goes to the heap, which is like a giant beach, and creates a new treasure chest out of thin air, big enough to hold exactly 10 ints (Which, being primitives, are like coins in this example). It then buries it in the sand, lost forever. Hence why new int[10]; all by its lonesome is quite useless.
When you write int[] arr = new int[10];, you still do that, but you now also make a treasure map. X marks the spot. 'arr' is this map. It is NOT AN INT ARRAY. It is a map to an int array. In java, both [] and . are 'follow the map, dig down, and open er up'.
arr[5] = 10; means: Follow your arr map, dig down, open up the chest you find there, and you'll see it has room for precisely 10 little pouches, each pouch large enough to hold one coin. Take the 6th pouch. Remove whatever was there, put a 10ct coin in.
It's not the map that needs to know how large the chest is that the map leads to. It's the chest itself. And this is true for objects as well, it is not possible in java to make a treasure chest that can arbitrarily resize itself.
So how does ArrayList work?
Maps-in-boxes.
ArrayList has, internally, a field of type Object[]. That field doesn't hold an object array. It can't. It holds a map to an object array: It's a reference.
So, what happens when you make a new arraylist? It is a treasure chest, fixed size, with room for exactly 2 things:
A map to an 'object array' treasure chest (which it will also make, with room for 10 maps, and buries it in the sand, and stores the map to this chest-of-maps inside itself.
A coinpouch. The coin inside represents how many objects the list actually contains. The map to the treasure it has leads to a treasure with room for 10 maps, but this coin (value: 0) says that so far, none of those maps go anywhere.
If you then run list.add("foo"), what that does is complicated:
"foo" is an object (i.e. treasure), so "foo" as an expression resolves to be a map to "foo". It then takes your list treasuremap, follows it, digs down, opens the box, and you yell 'oi! ADD THIS!', handing it a copy of your treasuremap to the "foo" treasure. What the box then does with this is opaque to you - that's the point of OO.
But let's dig into the sources of arraylist: What it will do, is query its treasuremap to the object array (which is private, you can't get to it, it's in a hidden compartment that only the djinn that lives in the treasure chest can open), follows it, digs down, and goes to the first slot (why? Because the 'size coin' in the coinpouch is currently at 0). It takes the map-to-nowhere that is there, tosses it out, makes a copy of your map to the "foo" treasure, and puts the copy in there. It then replaces its coin in the coin pouch with a penny, to indicate it is now size 1.
If you add an 11th element, the ArrayList djinn goes out to the other treasure, notices there is no room, and goes: Well, dang. Okay. It then conjures up an entirely new treasure chest that can hold 15 treasure maps, it copies over the 10 maps in the old treasure, moves them to the new treasurechest, adds the copy of the map of the thing you added as 11th, then goes back to its own chest, rips out the map to the real treasure and replaces it to a map of the newly made treasure (With 15 slots), and puts an 11ct coin in the pouch.
The old treasure chest remains exactly where it is. If nobody has any maps to this (and nobody does), eventually, the beachcomber finds it, gets rid of it (that'd be the garbage collector).
Thus, ALL treasure chests are fixed size, but by replacing maps with new maps and conjuring up new treasure chests, you can nevertheless make it look like ArrayList is capable of shrinking and growing.
So why don't arrays allow it? Because that shrinking and growing stuff is complicated and arrays expose low-level functionality. Don't use arrays, use Lists.
You seem to misunderstand what "storage" means. You say "there's no limit on the amount of data I can store", but if you run myInteger.myValue = 15, you overwrite the value of 32 that you put there originally. You still can't store any more than 32 bits, it's simply that you can change which 32 bits you put in that variable.
If you want to see how ArrayList works, you can read the source code; it can expand because if it runs out of space it creates a new larger array and switches its single array variable elementData to it.
Based on your update, it seems like you may be wondering about the ability to add lots of different fields to your object definition. In this case, those fields and their types are fixed when the class is compiled, and from that point on the class has a fixed size. You can't just pile in extra properties at runtime like you can in JavaScript. You are telling it up front about the scale it needs.
I'm going to ignore most of the details you've given, and answer the question in your edit.
My question is, how come when I make an array I need to tell it how big the object it points to is, but when I make any other object it scales on its own without me telling it anything upfront?
It's worth starting by dealing with "when I make any other object it scales on its own", because this isn't true. If you create a class like this:
class MyInteger
public int value;
public MyInteger(int value) {
this.value = value;
}
}
Then that class has a statically defined size. Once you've compiled this class, the amount of memory for an instance of MyInteger is already determined. In this case, it's the object header size (JVM dependent), and the size of an integer (at least 4 bytes).
Once an object has been allocated by the JVM, its size cannot change. It is treated as a block of bytes by the JVM (and importantly, the garbage collector) until it is reclaimed. Classes like ArrayList give the illusion of growing, but they actually work by allocating other objects, which they store references to.
class MyArrayList {
public int[] values;
public MyArrayList(int[] values) {
this.values = values;
}
}
In this case, the MyArrayList instance will always take the same amount of memory (object header size + reference size), but the array that is referenced may change. We could do something like this:
MyArrayList list = new MyArrayList(new int[50]);
This allocates a block of memory for list, and a block of memory for list.values. If we then do (as ArrayList effectively does internally):
list.values = new int[500];
then the memory allocated for list is still the same size, but we have allocated a new block which we then reference in list.values. This leaves our old int[50] with no references (so it can be garbage collected). Importantly, though, no allocation has changed size. We have reallocated a new, bigger, block for our list to use, and have referenced it from our MyArrayList instance.
Why do arrays in Java need to have a pre-defined length when Objects don't?
In order to understand this, we need to establish that "size" is a complicated concept in Java. There are a variety of meanings:
Each object is stored in the heap as one or more heap nodes, where one of these is the primary node, and the rest are component objects that can be reached from the primary node.
The primary heap node is represented by a fixed and unchanging number of bytes of heap memory. I will call this1 the native size of the object.
An array has an explicit length field. This field is not declared. It has a type int and cannot be assigned to. There is actually a 32 bit field in the header of each array instance that holds the length.
The length of an array directly maps to its native size. The JVM can compute the native size from the length.
An object that is not an array instance also has a native size. This is determined by the number and types of the object's fields. Since fields cannot be added or removed at runtime, the native size does not change. But it doesn't need to be stored since it can be determined (when needed) at runtime from the object's class.
Some objects support a class specific size concept. For example, a String has a size returned by its length() method, and an ArrayList has a size returned by its size() method.
NB:
The meaning of the class specific size is ... class specific.
The class specific size does not correlate to the native size of an instance. (Except in degenerate cases ...)
In fact, all objects have a fixed native size.
1 - This term is solely for the purposes of this answer. I claim no authority for this term ...
Examples:
A String[] has a native size that depends on its length. On a typical JVM it will be 12 + length * (<reference size>) rounded up to a multiple of 16 bytes.
Your Integer class has a fixed native size. On a typical JVM each instance will be 16 bytes long.
An ArrayList object has 2 private int fields and a private Object[] field. That gives it a fixed native size of either 16 or 24 bytes. One of the int fields is call size, and it contains the value returned by size().
The size of an ArrayList may change, but this is implemented by the code of the class. In order to do this, it may need to reallocate its internal Object[] to make it large enough to hold more elements. If you examine the source code for the ArrayList class, you can see how this happens. (Look for the ensureCapacity and grow methods.)
So, the differences between the size(s) of regular object and the length of an array are:
The natural size of a regular object is determined solely by the type of the object, it never changes. It is rarely relevant to the application and it is not exposed via a field.
The length of an array depends on value supplied when you instantiate it. It never changes. The natural size can be determined from the length.
The class specific size of an object (if relevant) is managed by the class.
To your revised question:
My question is, how come when I make an array I need to tell it how big the object it points to is, but when I make any other object it scales on its own without me telling it anything upfront? (With ArrayLists being an example of the difference)
The point is that at the JVM level, NOTHING scales automatically. The native size of a Java object CANNOT change.
Why? Because increasing the size of the object's heap node would entail moving the heap node, and a heap node cannot be moved without updating all references for the object. That cannot be done efficiently.
(It has been pointed out that the GC can efficiently move heap nodes. However, that is not a viable solution. Running the GC is expensive. It would be highly inefficient to perform a GC in order to (say) grow a single Java array. If Java had been specified so that arrays could "grow", it would need to implemented using an underlying non-growable array type.)
The ArrayList case is being handled by the ArrayList class itself, and it does it by (if necessary) creating a new, larger backing array, copying the elements from the old to the new, and then discarding the old backing array. It also adjusts the size field that hold the logical size of the list.
Object arrays allocate space for object pointers, and not entire objects in memory.
So new String[10] doesnt allocate space for 10 strings, but for 10 object references that would be point to what strings are stored in the array.

Intermediate between LinkedList and ArrayList

In the following piece of code, I have the result of a query, but I have no clue about the total number of records. I have to store it into a container. When I read each record of the container, it will be a simple loop so that index-based access will not be used.
List<MyObject> list;
while ( source.hasNext() ) {
MyObject ob = new MyObject();
convertObject(ob, source.next());
list.add(ob);
}
...
//Another method
for (MyObject ob : objects){
showThings(ob);
}
LinkedList is poor because it creates many small objects with pointers to the next one. It uses more memory, makes the memory more fragmented and has more cache miss.
ArrayList is poor because I don't know the number of records that I will insert. Whenever I insert a new item and the inner array is full, it will allocate a bigger block of memory and copy everything to the new block.
I didn't find any solution in java.util. So I consider writing a custom list. It will be like a LinkedList, but each cell is an array. In other words, the first node will be like an ArrayList, but when it is full and I insert a new object, it will create another node with an array to insert the new items instead of copying everything. However, I may be reinventing the wheel somehow.
Whenever I insert a new item and the inner array is full, it will allocate a bigger block of memory and copy everything to the new block
This is not true, you can preallocate a hash array of any size you want, and it only increments when it runs out of space, increasing by 50% each time, so if you have a reasonable guess of the expected size, or even guess a large number, the reallocation cost will be zero or minimal.
Some googling unveils the Brownies Collections' GapList.
It is organizing its contents in blocks and manages them by arranging them in a tree. It also does block merging in case after element removal they become sparse.

Iterating over array shared between threads

I have an array which size does not change during the execution of my program. Let's say there are multiple threads which are changing the content of this array, something like
array[validIndex] = new Entity();
Is it safe to iterate through such array at any point in time? Let's say that I don't care about the objects which are "inside" the array.
Array sizes in Java don't change...ever.
Iterating through an array is essentially looping through array indices and getting the element at each index -- whether you do it explicitly, or you use the shiny for( Entity e: array ) ... syntax -- so there's no way the iteration itself will go wrong, even with changing array contents.
The objects you're going to see through the iteration may not constitute a valid "snapshot" of the contents of the array at any given point in time, but as far as I understand, this is not an issue in your case.
You can make array thread safe as follows:
list arryList = new ArrayList();
list arratListNew= Collections.synchronizedList(arrayList);
//Here arratListNew will be threadsafe array.

Non runtime allocation solution - ArrayList

I'm making a game in Java. I need some solution for my current runtime allocation, caused by my ArrayList. Every single minute or 30 seconds the garbage collector starts to runs because of I am calling for draw and updates-method through this collection.
How should I be able to do a non runtime allocation solution?
Thanks in advance and if needed, my code is posted below from my Manager class which contains the ArrayList of objects.:
Some code:
#Override
public void draw(GL10 gl) {
final int size = objects.size();
for(int x = 0; x < size; x++) {
Object object = objects.get(x);
object.draw(gl);
}
}
public void add(Object parent) {
objects.add(parent);
}
//Get collection, and later we call the draw function from these objects
public ArrayList<Object> getObjects() {
return objects;
}
public int getNumberOfObjects() {
return objects.size();
}
More explanation: The reason I mix with this is because (1) I see that the ArrayList implementation is slow and causing lags and (2) that I want to merge the objects/components together. When firing an update call from my Thread-class, it goes through my collection, send things down the tree/graph using the Manager's update function.
When looking at an Open Source project, Replica Island, I found that he used an alternative class FixedSizeArray that he wrotes on his own. Since I'm not that good at Java, I wanted to make things easier and now I'm looking for another solution. And at last, he explained WHY he made the special class:
FixedSizeArray is an alternative to a standard Java collection like ArrayList. It is designed to provide a contiguous array of fixed length which can be accessed, sorted, and searched without requiring any runtime allocation. This implementation makes a distinction between the "capacity" of an array (the maximum number of objects it can contain) and the "count" of an array (the current number of objects inserted into the array). Operations such as set() and remove() can only operate on objects that have been explicitly add()-ed to the array; that is, indexes larger than getCount() but smaller than getCapacity() can't be used on their own.
I see that the ArrayList implementation is slow and causing lags ...
If you see that, you are misinterpreting the evidence and jumping to unjustifiable conclusions. ArrayList is NOT slow, and it does NOT cause lags ... unless you use the class in a particularly suboptimal way.
The only times that an array list allocates memory are when you create the list, add more elements, copy the list, or call iterator().
When you create the array list, 2 java objects are created; one for the ArrayList and one for its backing array. If you use the initialCapacity argument and give an appropriate value, you can arrange that subsequent updates will not allocate memory.
When you add or insert an element, the array list may allocate one new object. But this only happens when the backing array is too small to hold all of the elements, and when it does happen the new backing array is typically twice the size of the old one. So inserting N elements will result in at most log2(N) allocations. Besides, if you create the array list with an appropriate initialCapacity, you can guarantee that there are zero allocations on add or insert.
When you copy a list to another list or array (using toArray or a copy constructor) you will get 1 or 2 allocations.
The iterator() method creates a new object each time you call it. But you can avoid this by iterating using an explicit index variable, List.size() and List.get(int). (Be aware that for (E e : someList) { ... } implicitly calls List.iterator().)
(External operations like Collections.sort do entail extra allocations, but that is not the fault of the array list. It will happen with any list type.)
In short, the only way you can get lots of allocations using an array list is if you create lots of array lists, or use them unintelligently.
The FixedSizedArray class you have found sounds like a waste of time. It sounds like it is equivalent to creating an ArrayList with an initial capacity ... with the restriction that it will break if you get the initial capacity wrong. Whoever wrote it probably doesn't understand Java collections very well.
It's not quite clear what you are asking, but:
If you know at compile time what objects should be in the collection, make it an array not an ArrayList and set the contents in an initialisation block.
Object[] objects = new Object[]{obj1,obj2,obj3};
What makes you think you know what the GC is reclaiming? Have you profiled your application?
What do you mean by "non-runtime allocation"? I'm really not even sure what you mean by "allocation" in this context... allocation of memory? That's done at runtime, obviously. You clearly aren't referring to any kind of fixed pool of objects that are known at compile time either, since your code allows adding objects to your list several different ways (not that you'd be able to allocate anything for them at compile time even if you were).
Beyond that, nothing in the code you've posted is going to cause garbage collection by itself. Objects can only be garbage collected when nothing in the program has a strong reference to them, and your posted code only allows adding objects to the ArrayList (though they can be removed by calling getObjects() and removing from that, of course). As long as you aren't removing objects from the objects list, you aren't reassigning objects to point to a different list, and the object containing it isn't itself becoming eligible for garbage collection, none of the objects it contains will ever be available for garbage collection either.
So basically, there isn't any specific problem with the code you've posted and your question doesn't make sense as asked. Perhaps there are more details you can provide or there's a better explanation of what exactly your issue is and what you want. If so, please try to add that to your question.
Edit:
From the description of FixedSizeArray and the code I looked at in it, it seems largely equivalent to an ArrayList that is initialized with a specific array capacity (using the constructor that takes an int initialCapcacity) except that it will fail at runtime if something tries to add to it when its array is full, where ArrayList will expand itself to hold more and continue working just fine. To be honest, it seems like a pointless class, possibly written because the author didn't actually understand ArrayList.
Note also that its statement about "not requiring any runtime allocation" is a bit misleading... it does of course have to allocate an array when it is created, but it just refuses to allocate a new array if its initial array fills up. You can achieve the same thing using ArrayList by simply giving it an initialCapacity that is at least large enough to hold the maximum number of objects you will ever add to it. If you do so, and you do in fact ensure you never add more than that number of objects to it, it will never allocate a new array after it is created.
However, none of this relates in any way to your stated issue about garbage collection, and your code still doesn't show anything that would cause huge numbers of objects to be garbage collected. If there is any issue at all, it may relate to the code that is actually calling the add and getObjects methods and what it's doing.

Converting to a column oriented array in Java

Although I have Java in the title, this could be for any OO language.
I'd like to know a few new ideas to improve the performance of something I'm trying to do.
I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives.
Example:
List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>();
public void newObject(Object[] obj) {
for(int i = 0; i < obj.length; i++) {
column-oriented.get(i).add(obj[i]);
}
}
Note: For simplicity I've omitted the initialization of objects and stuff.
The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas.
How would you do this knowing it's very performance sensitive?
EDIT:
I've tested a few things and found that:
Instead of using ArrayList (or any other Collection), I wrapped an Object[] array in another object to store individual columns. If this array reaches its capacity, I create another array with twice de size and copy the contents from one to another using System.copyArray. Surprisingly (at least for me) this is faster that using ArrayList to store the inner columns...
The answer depends on the data and usage profile. How much data do you have in such collections? What is proportions of reads/writes (adding objects array)? This affects what structure for inner list is better and many other possible optimizations.
The fastest way to copy data is avoid copying at all. If you know that obj array is not modified further by the caller code (this is important condition), one of possible tricks is to implement you custom List class to use as inner list. Internally you will store shared List<Object[]>. Each call we just add new array to that list. Custom inner list class will know which column it represents (let it be n) and when it is asked to give item at position m, it will transpose m and n and query internal structure to get internalArray.get(m)[n]. This implementation is unsafe because of limitation on the caller that is easy to forget about but might be faster under some conditions (however, this might be slower under other).
I would try using LinkedList for the inner list, because it should have better performance for insertions. Maybe wrappping Object arra into collection and using addAll might help as well.
ArrayList may be slow, due to copying of arrays (It uses a similar approach as your self-written collection).
As an alternate solution you could try to simply store the Rows at first and create columns when neccessary. This way, copying of the internal arrays at the list is reduced to a minimum.
Example:
//Notice: You can use a LinkedList for rows, as no index based access is used.
List<Object[]> rows =...
List<List<Object>> columns;
public void processColumns() {
columns = new ArrayList<List<Object>>();
for(Object[] aRow : rows){
while (aRow.size() > columns.size()){
//This ensures that the ArrayList is big enough, so no copying is necessary
List<Object> newColumn = new ArrayList<Object>(rows.size())
columns.add(newColumn);
}
for (int i = 0; i < aRow.length; i++){
columns.get(i).add(aRow[i]);
}
}
}
Depending on the number of columns, it's still possible that the outer list is copying arrays internally, but normal tables contains far more rows than columns, so it should be a small array only.
Use a LinkedList for implementing the column lists. It's grows linearly with the data and is O(1). (If you use ArrayList it has to resize the internal array from time to time).
After collecting the values you can convert that linked lists to arrays. If N is the number of rows you will pass from holding 3*N refs for each list (each LInkedList has prevRef/nextRef/itemRef) to only N refs.
It would be nice to have an array for holding the different column lists, but of course, it's not a big improvement and you can do it only if you know the column count in advance.
Hope it helps!
Edit tests and theory indicate that ArrayList is better in amortized cost, it is, the total cost divided by the number of items processed... so don't follow my 'advice' :)

Categories

Resources