Initial capacity for CopyOnWriteArrayList

Initial capacity for CopyOnWriteArrayList - java

Many List implementations have an option to specify an initial capacity for the collection, why is this not allowed for CopyOnWriteArrayList?

In a conventional ArrayList the capacity is a hint to reserve more space in the backing array for more elements to be added to the list later on.
In a CopyOnWriteArrayList, every (atomic) write operation creates a new backing array. There is no point in preallocating an array that is bigger than the current list size because that space would never be used.

Related

What is the benefit of setting the capacity of an ArrayList explicitly [duplicate]

This question already has answers here:
Why start an ArrayList with an initial capacity?
(11 answers)
Closed 9 months ago.
The community reviewed whether to reopen this question 9 months ago and left it closed:
Original close reason(s) were not resolved
In java ArrayList we have a constructor -
ArrayList(int capacity)
and two methods -
void ensureCapacity(int minCapacity)
void trimToSize()
Consider a code-sample:
ArrayList<String> arrayList3 = new ArrayList<>(5);
System.out.println(arrayList3.size());
arrayList3.add("Zebra");
arrayList3.add("Giraffe");
arrayList3.add("Bison");
System.out.println(arrayList3);
System.out.println(arrayList3.size());
arrayList3.add("Rhino");
arrayList3.add("Hippo");
arrayList3.add("Elephant");
arrayList3.add("Antelope");
System.out.println(arrayList3);
System.out.println(arrayList3.size());
Output:
0
[Zebra, Giraffe, Bison]
3
[Zebra, Giraffe, Bison, Rhino, Hippo, Elephant, Antelope]
7
Here, I fail to see how setting an initial capacity affects the execution of the program. ArrayList is a flexible list that changes size as per demand. So what is the significance of setting the capacity explicitly?
And in the case if I want to set the capacity explicitly, is there any method to view the current capacity? Since int size() clearly is not applicable here.

ArrayList as an implementation of a Dynamic array data structure.
It resizes when its underlying array gets full (i.e. the current list index exceeds the last valid index of the underlying array).
When it happens, method add() (or addAll) internally will invoke the method grow(). Which will double the capacity. I.e. it will create a new array with the length two times bigger than the previous length plus a number of new elements that don't fit into the current size.
The growth has a cost of O(n) because all previously added elements need to be copied into the new array.
Reminder: when resizing isn't required, a new element will be added in constant time O(1).
No-argument constructor creates an ArrayList with capacity of 10.
If you expect that a newly created ArrayList would eventually contain let's say 50,000 elements, it makes sense to use an overloaded constructor to provide the initial capacity of 50,000 in order to improve performance by avoiding unnecessary resizing.
Also, for that you can use method ensureCapacity() which accessible in the ArrayList class (not in the List interface, because the notion of capacity isn't applicable to LinkedList which isn't backed by an array).
is there any method to view the current capacity
No, there isn't. That's called encapsulation. ArrayList, StringBuilder, HashMap, etc. are backed by a plain array, but they will not allow interacting with their underlying array directly.
But if you have a case when array initially increases size and then a lot of elements are being removed, and you want to release unoccupied heap space, you can use method trimToSize():
Trims the capacity of this ArrayList instance to be the list's current size. An application can use this operation to minimize the storage of an ArrayList instance.
But it has to be used with caution, because it can lead to cyclic growth and trimming, which will cause performance degradation.
Note that there's no need to worry about the amount of unoccupied space if the list is moderate in size, or if you are not expecting let's say 80% of the data to be removed in one go. I.e. even if the list is huge but 50% of its elements gets removed, and you apply trimToSize() on it, it'll restore its previous capacity with the next added element - that's the scenario of continuously growing and shrinking list which will perform badly.
As a possible option, if you heave a case when most of the data can be removed from a list, instead of using trimToSize() you can filter out the elements that have to be preserved, place them into a new list and dereference the previous one.

What's the difference between Stream.builder() and calling stream() on an ArrayList in Java?

Is there any difference between using Stream.builder() versus creating an ArrayList and then calling stream() on it?

This is an implementation detail, but yes, the builder is better optimized to the use case of being incrementally filled, followed by an operation streaming over the contained elements.
In contrast, an ArrayList has to support arbitrary modification and random access.
So, when repeatedly adding elements to an ArrayList without specifying a correctly predicted initial capacity, it may need to allocate a new, larger array and copy the current array into it whenever the current capacity is exhausted.
In contrast, the builder has a special support for the single element case, which doesn’t need an array at all. Then, if more elements are added, it will turn to a spined buffer. This buffer starts with a small array like ArrayList but when its capacity is exhausted, it begins to use an array of arrays instead of repeatedly copying the array to a larger flat array.
So this saves the copying costs you’d have when filling an ArrayList. You can save these costs for ArrayList by specifying the right initial capacity, but that only works when an estimate is available. Specifying an initial capacity also removes the optimization for the empty case. So generally, the stream builder can deal with unknown sizes much better.
Another property of this design is that Stream.Builder can deal with more than 2³¹ elements, unlike ArrayList, if you have enough memory.

Stream.builder() is not a terminal operation, so it's lazy. Using the second one, in theory, uses more memory. From the Stream.Builder Javadoc, This allows the creation of a Stream by generating elements individually and adding them to the Builder (without the copying overhead that comes from using an ArrayList as a temporary buffer.)

Does appending/removing entries to a Java list reallocate memory?

This is low-level memory question about how Java performs .add and .remove on an ArrayList or other types of lists. I would think that Java would have to do a reallocation of memory to append/remove items to a list, but it could be doing something I'm not thinking of to avoid this. Does anyone know?

If by "regular list" you mean java.util.List, that is an interface. It does not specify anything about whether or when any memory is allocated in association with adding or removing elements -- those are details of specific implementations.
As for java.util.ArrayList in particular, its docs say:
Each ArrayList instance has a capacity. The capacity is the size of the array used to store the elements in the list. It is always at least as large as the list size. As elements are added to an ArrayList, its capacity grows automatically. The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.
In other words, Java does not specify the answer to your question.
If I were to speculate based on the available documentation, I would guess that java.util.ArrayList.remove() never performs any memory allocation or reallocation. It seems to follow from the docs overall that java.util.ArrayList.add() allocates additional space at least sometimes (in the form of a new, longer internal array). In order to achieve constant amortized cost for element additions, however, I don't see how it could reallocate on every element addition. Almost certainly, it reallocates only when its capacity is insufficient, and then it scales the capacity by a constant factor -- e.g. doubles it.

All list implementations require storage of some information about the objects in the list and the order of those objects. Larger lists require more such information because there is some information for each object in the list. Thus adding to a list must, on average, result in allocation of more storage for this information.
Adding an element to a list does not copy the object that was added to the list. Indeed, no Java statements cause an additional copy of an object to be visible to your program (you have to explicitly use a copy constructor or a clone method to do that). This is because Java objects are never accessed directly, but are always accessed through a reference. Adding an object to a collection really means adding a new reference to the object to the collection.

Instantiating an ArrayList of ArrayLists

I want to instantiate an ArrayList of ArrayLists (of a generic type). I have this code:
private ArrayList<ArrayList<GameObject>> toDoFlags;
toDoFlags = new ArrayList<ArrayList<GameObject>>(2);
Am I doing this right? It compiles, but when I look at the ArrayList, it has a size of 0.

You're doing it right. The reason it has zero length is because you haven't added anything to it yet.
The "2" you pass is the initial capacity of the array that backs the ArrayList. But the size() method of the ArrayList doesn't return the initial capacity of its backing array... it returns the number of actual elements in the list.
Customarily, you shouldn't be using the initialCapacity parameter. It's a performance optimization when you have large ArrayLists. By allocating a lot of space explicitly, you save the time it would take to re-allocate as you add more and more items to the list. But in this case you probably don't have an extremely large list.
Also, instead of using an ArrayList of ArrayLists, you should consider writing a class to store your data.

ArrayLists expand as you add to them. The integer capacity argument just sets the initial size of the backing array. Setting the capacity to two doesn't mean that there are two elements in the ArrayList, but rather that two elements can be added to the ArrayList before it has to declare a larger internal array.

Why iteration time over a LinkedHashSet is not dependent on its capacity?

From the Java Docs of LinkedHashSet(LHS) class :
Iteration over a LinkedHashSet requires time proportional to the size
of the set, regardless of its capacity. Iteration over a HashSet is
likely to be more expensive, requiring time proportional to its
capacity.
My question is why does iteration time over a LHS has no bearing on the capacity of the set ?

Because the LinkedHashSet comprises internally both a LinkedList and a Set. When iterating, you iterate over the (I believe, double) LinkedList, not the HashSet.

Create a regular HashSet with a capacity of 1MB (new HashSet(1024 * 1024), add 1 element and try to iterate. Though the HashSet has only 1 element the iterator will have to go over all 1MB buckets of the underlying hastable. But if it was a LinkedHashSet the iterator would not go over the hashtable (that one is used only for get() and contains()) but would go thru the LinkedList (parallel structure) and there is only one element in it.

Iterating over a HashSet you need (pretty much) iterate over the buckets that contain the elements, then to eliminate empty values, which requires additional time. Briefly - there is some overhead associated with sorting empty elements out.
The nature of Linked collections is so that every element points to the next one. So, you start with the first and without much problems pull the next, and so on - this way you easily iterate them all.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.