I understand that capacity is the number of elements or available spaces in an ArrayList that may or may not hold a value referencing an object. I am trying to understand more about the concept of capacity.
So I have three questions:
1) What are some good ways to define what capacity represents from a memory standpoint?
...the (contiguous?) memory allocated to the ArrayList?
...the ArrayLists’s memory footprint on the (heap?)?
2) Then if the above is true, changing capacity requires some manner of memory management overhead?
3) Anyone have an example where #2 was or could be a performance concern? Aside from maybe a large number of large ArrayLists having their capacities continually adjusted?
The class is called ArrayList because it's based on an array. The capacity is the size of the array, which requires a block of contiguous heap memory. However, note that the array itself contains only references to the elements, which are separate objects on the heap.
Increasing the capacity requires allocating a new, larger array and copying all the references from the old array to the new one, after which the old one becomes eligible for garbage collection.
You've cited the main case where performance could be a concern. In practice, I've never seen it actually become a problem, since the element objects usually take up much more memory (and possibly CPU time) than the list.
ArrayList is implemented like this:
class ArrayList {
private Object[] elements;
}
the capacity is the size of that array.
Now, if your capacity is 10, and you're adding 11-th element, ArrayList will do this:
Object[] newElements = new Object[capacity * 1.5];
System.arraycopy(this.elements, newElements);
this.elements = newElements;
So if you start off with a small capacity, ArrayList will end up creating a bunch of arrays and copying stuff around for you as you keep adding elements, which isn't good.
On the other hand, if you specify a capacity of 1,000,000 and add only 3 elements to ArrayList, that also is kinda bad.
Rule of thumb: if you know the capacity, specify it. If you aren't sure but know the upper bound, specify that. If you just aren't sure, use the defaults.
Capacity is as you described it -- the contiguous memory allocated to an ArrayList for storage of values. ArrayList stores all values in an array, and automatically resizes the array for you. This incurs memory management overhead when resizing.
If I remember correctly, Java increases the size of an ArrayList's backing array from size N to size 2N + 2 when you try to add one more element than the capacity can take. I do not know what size it increases to when you use the insert method (or similar) to insert at a specific position beyond the end of the capacity, or even whether it allows this.
Here is an example to help you think about how it works. Picture each space between the |s as a cell in the backing array:
| | |
size = 0 (contains no elements), capacity = 2 (can contain 2 elements).
|1| |
size = 1 (contains 1 element), capacity = 2 (can contain 2 elements).
|1|2|
size = 2, capacity = 2. Adding another element:
|1|2|3| | | |
size increased by 1, capacity increased to 6 (2 * 2 + 2). This can be expensive with large arrays, as allocating a large contiguous memory region can require a bit of work (as opposed to a LinkedList, which allocates many small pieces of memory) because the JVM needs to search for an appropriate location, and may need to ask the OS for more memory. It is also expensive to copy a large number of values from one place to another, which would be done once such a region was found.
My rule of thumb is this: If you know the capacity you will require, use an ArrayList because there will only be one allocation and access is very fast. If you do not know your required capacity, use a LinkedList because adding a new value always takes the same amount of work, and there is no copying involved.
1) What are some good ways to define what capacity represents from a memory standpoint?
...the (contiguous?) memory allocated to the ArrayList?
Yes, an ArrayList is backed up by an array, to that represents the internal array size.
...the ArrayLists’s memory footprint on the (heap?)?
Yes, the larget the array capacity, the more footprint used by the arraylist.
2) Then if the above is true, changing capacity requires some manner of memory management overhead?
It is. When the list grows large enough, a larger array is allocated and the contents copied. The previous array maybe discarded and marked for garbage collection.
3) Anyone have an example where #2 was or could be a performance concern? Aside from maybe a large number of large ArrayLists having their capacities continually adjusted?
Yes, if you create the ArrayList with initial capacity of 1 ( for instance ) and your list grows way beyond that. If you know upfront the number of elements to store, you better request an initial capacity of that size.
However I think this should be low in your list of priorities, while array copy may happen very often, it is optimized since the early stages of Java, and should not be a concern. Better would be to choose a right algorithm, I think. Remember: Premature optimization is the root of all evil
See also: When to use LinkedList over ArrayList
Related
I am using array for storing a kind of object. I created an array of some fixed size
int arr[]=new int[n];
Now after processing this array i want to free upto 75% of the memory from this array(now only n/4 elements are useful). So what my question is, since n is very large and i wish not to hold larger memory than useful, How can i reduce size of array at runtime without copying to new array of size n/4(Is it even possible or not?).
It's not possible. You cannot change the length of an array after you initialise it.
What you can do is create another array with suitable size and make this large array eligible for Garbage Collector.
Best is to use ArrayList if you are allowed to do that.
It's not possible at all. Eventhough you are using collections for creating they are copying the array elements to newer size array if the array size exceeds!
No, Once you created an array then its size is fixed. You cant change it at run time. For your current scenario you can copy your useful elements to new array. Or just keep the array don't care about the memory.
From a recently posted question I came across ArrayList#trimToSize() which reduces the size of the backing array to current size of collection.
Quoting javadoc
Trims the capacity of this ArrayList instance to be the list's current
size. An application can use this operation to minimize the storage of
an ArrayList instance.
And the Javadoc says that the application can use to reduce the memory footprint of backing array. If I am not wrong this method won't be useful for small sizes as the cost of some references won't hurt that much.
But because of the algorithm used by arraylist int newCapacity = (oldCapacity * 3)/2 + 1; in 1.6 and int newCapacity = oldCapacity + (oldCapacity >> 1); in 1.7, while adding new element if oldcapacity is large then it will create a new backing array with above algorithm and may allocate much unneeded space, if only one element is added after dynamic expansion.
Is my reasoning behind the method correct or there are some other applications to it?
Yes the backing array is increased by ~50% when it's full. For example, the program below adds 1 million entries, calls trimToSize then adds one entry. The backing array's length is 1.2m after adding the entries, 1m after trimming and 1.5m after adding one item.
So unless you know that you won't be adding to the list any longer, calling trimToSize could be counter-productive.
ArrayList<Integer> list = new ArrayList<>();
Field e = list.getClass().getDeclaredField("elementData");
e.setAccessible(true);
for (int i = 0; i < 1_000_000; i++) {
list.add(i);
}
System.out.println(((Object[]) e.get(list)).length); //1215487
list.trimToSize();
System.out.println(((Object[]) e.get(list)).length); //1000000
list.add(0);
System.out.println(((Object[]) e.get(list)).length); //1500000
Another situation is when we add elements and then remove many of them. When we remove elements internal aray stays unchanged.
Your formula results in worst-case overhead of just 50% of empty slots. Note that the minimum size of an Object is 24 bytes, compared to just 4 bytes for a compressed OOP in the array. The overhead amounts to just
(0.5*4) / (24+4) == 1/14 == 7%
which can hardly ever be worth considering—and that's the worst it can get. On average it's half the overhead in the array entries, and often the objects are much larger.
So the only time it would make sense to call trimToSize is after a massive removal from a previously huge arraylist. In other words, almost never.
I have an in-memory collection that I want to flush to disk once it has reached either a certain size (count wise) or memory footprint.
How can I determine how much memory a collection is using?
It is going to be some sort of Dictionary/Map.
You can't, easily. For example, consider an ArrayList<String> with a backing array of size 256 and an "in use" size of 200, where each string is 20 characters long, backed by a 30 character backing array.
It sounds like you could easily work out how much memory that's taking - but if every element in the array is actually a reference to the same string, then obviously it takes a lot less memory. That's just for String, which is a class which is relatively straightforward to analyze. For classes with various mixtures of definitely-distinct and possibly-shared references, it becomes even more complicated.
You could serialize it - but that only shows you how much space it takes up when serialized, not in memory.
I suggest you experiment and find some appropriate "average" size, derive a maximum count that makes sense, and just go on that basis.
Currently, I am scraping out a chunk of data (paragraphs/strings) from a text file and writing it out to a new file. However, I am planning on adding some conditionals later and thus want to be able to take out this chunk of data and only store it in a temporary array, then write out to a file if the conditionals are met. However, I am not sure how to write this out to an array without knowing the size of the array beforehand.
Does anyone have any ideas?
Don't use an array. Use a collection of type String that can grow dynamically such as an ArrayList for example. Here are some quick code samples: Sample 1, Sample 2
Some notes on an ArrayList's memory management from the Java docs:
The capacity is the size of the array
used to store the elements in the
list. It is always at least as large
as the list size. As elements are
added to an ArrayList, its capacity
grows automatically. The details of
the growth policy are not specified
beyond the fact that adding an element
has constant amortized time cost.
An application can increase the
capacity of an ArrayList instance
before adding a large number of
elements using the ensureCapacity
operation. This may reduce the amount
of incremental reallocation.
Notice that even the docs do not specify exactly how things are managed internally.
In Java, an ArrayList (or any other type of Java collection) can take care of all the memory management for you:
ArrayList<String> strings = new ArrayList<String>();
If you want to add a string:
strings.add("New String");
If you want to get a String at a certain index (in this example, index 1):
strings.get(1);
There are a lot more methods in the ArrayList class as well.
You do need a collection that grows dynamically. ArrayList is the first that comes to mind; internally it is very similar to a regular array, so it offers fast random access, if you need it. LinkedList may be better suited if you don't have an estimate about the number of elements that you will eventually need, provided that you will only need sequential access to its elements (random access is available, but it will not be fast).
Suppose there are 10 elements in ArrayList and if i have deleted 2 elements from the middle , so the arraylist will contain 8 elements , but will the capacity be 10 or reduced to 8 at that time.
The API states :
Each ArrayList instance has a
capacity. The capacity is the size of
the array used to store the elements
in the list. It is always at least as
large as the list size. As elements
are added to an ArrayList, its
capacity grows automatically. The
details of the growth policy are not
specified beyond the fact that adding
an element has constant amortized time
cost.
and you can always test this empirically in your debugger. After removing two elements, look at the array that backs the ArrayList, and see what it's size is. Most likely, it's 10.
The behaviour depends on the implementation of the ArrayList. You can't rely on this kind of implementation details. The only thing you must consider is that the capacity is always greater or equal to the number of elements in your list. If you need the capacity to be exactly the number of elements (because you may want to reduce memory usage), you will have to ask it explicitly with the function trimToSize().
I think it may change because suppose we create new array list by default JVM allocate say 10 continuous locations in memory .if you put 8 or 2 or zero it will be same .
But is you put 15 element then it will increase it to say 20 memory locations .But if you below 10 again it should release that memory and reduce to 10 default size .
This is dynamic allocation .For 1000 element list if you are removing the element and reduce to 2 then it is logical that it must release that memory .It should depend on number of elements removed.