Quote: Default initial capacity of ArrayDeque is 16. It will increase at a power of 2 (24, 25, 26 and so on) when size exceeds capacity.
Does this mean it behaves similar to ArrayList? Each time size exceeds capacity there is new array where older elements are copied to? Can I say internal implementation of ArrayDequeue and ArrayList is array (as their name says)? Just the resizing differs?
Yes ArrayDeque behaves similarly to ArrayList: Internally it uses an Object array. If the capacity is not enough, it creates a new, larger array, and copies items from the old array to the new.
The Java API specification does not require any particular resizing behavior. In fact the current implementation in OpenJDK doubles the size of the array if it's small (64), otherwise it grows by 50%:
// Double capacity if small; else grow by 50%
int jump = (oldCapacity < 64) ? (oldCapacity + 2) : (oldCapacity >> 1);
It seems that the"doubling" behavior is approximate: thanks to the "+2" after the first resize the capacity is 16+16+2 = 34. After the second resize it's 34+34+2 = 70. After that the array increases by 50% in every resize.
Related
In my program key-value pairs are frequently added to a Map until 1G of pairs are added. Map resizing slows down the process. How can I set minimum Map size to, for example 1000000007 (which is a prime)?
The constructor of a HashMap takes the initial size of the map (and the load factor, if desired).
Map<K,V> map = new HashMap<>(1_000_000_007);
How can I set minimum Map size to, for example 1000000007 (which is a prime)?
Using the HashMap(int) or HashMap(int, float) constructor. The int parameter is the capacity.
HashMap should have a size that is prime to minimize clustering.
Past and current implementations of the HashMap constructor will all choose a capacity that is the smallest power of 2 (up to 230) which is greater or equal to the supplied capacity. So using a prime number has no effect.
Will the constructor prevent map from resizing down?
HashMaps don't resize down.
(Note that size and capacity are different things. The size() method returns the number of currently entries in the Map. You can't "set" the size.)
A could of things you should note. The number of buckets in a HashMap is a power of 2 (might not be in future), the next power of 2 is 2^30. The load factor determines at what size it should grow the Map. Typically this is 0.75.
If you set the capacity to be the expected size, it will;
round up to the next power of 2
might still resize when the capacity * 0.75 is reached.
is limited to 2^30 anyway as it is the largest power of 2 you can have for the size of an array.
Will the constructor prevent map from resizing down?
The only way to do this is to copy all the elements into a new Map. This is not done automatically.
Why does ensureCapacity() in Java ArrayList extend the capacity with a const 1.5 or (oldCapacity * 3)/2 + 1?
Resizing the array is a relatively expensive operation. It wants to try and make sure that if the method gets called with ensureCapacity(11), ensureCapacity(12), ensureCapacity(13), ... it should not have to resize the array every time. So it resizes by a reasonable chunk (increase by 50%) instead of the minimum specified.
The main reason lies the (asymptotic) complexity of adding a sequence of elements to the list.
Note that the add method internally calls ensureCapacity(size+1). When the size of the internal array is increased, all elements have to be copied into the new, larger array.
If the size was only increased by a constant amount (which would be 1 for each call to add), then adding n elements would have a complexity of O(n2).
Instead, the size is always increased by a constant factor. Then, adding n elements only has a complexity of O(n).
Does using an ArrayList take less memory when compared to Vector? I have read Vector doubles its internal array size when vector reaches the max size where as ArrayList does it only by half? Is this a true statement? I need the answer when I do not declare Vector with the values for initialcapacity and capacityIncrement.
Yes you are correct in terms of memory allocation of internal arrays:
Internally, both the ArrayList and Vector hold onto their contents using an Array. When an element is inserted into an ArrayList or a Vector, the object will need to expand its internal array if it runs out of room. A Vector defaults to doubling the size of its array, while the ArrayList increases its array size by 50 percent.
Correction
It is not always that these Vector will double up the capacity. It may just increase its size upto the increment mentioned in the constructor :
public Vector(int initialCapacity, int capacityIncrement)
The logic in grow method is to increase the capacity to double if increment not mentioned, otherwise use the capacityIncrement, here is the code of Vector grow method:
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + ((capacityIncrement > 0) ?
capacityIncrement : oldCapacity);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
elementData = Arrays.copyOf(elementData, newCapacity);
}
There is no comparison between Vector and ArrayList as they fit different purposes. Vector was supposed to be a concurrency safe List implementation. However, the design of the class was severely flawed and did not provide concurrency guarantees for the most common use case of iteration.
Vector itself is easily replaced with Collections.synchronizedList(new ArrayList()). The result of course contains the same flaw as Vector. Vector should be considered deprecated.
The use of Vector is a now a mark for naivety in understanding Java and concurrent programming. Don't use it.
To answer the original question:
ArrayList by default will grow the capacity by half of the current capacity. However, at any time, the program may call ensureCapacity to set the capacity to an appropriately large value.
Vector by default will grow the capacity by doubling. However, there is a constructor that allows setting the grow amount. Using a small grow value will have a negative impact on performance. Additionally, you could actually get less capacity since each grow requires a duplicate array to exist in memory for a short period of time.
In a comment, the OP has stated:
The application pulls huge data set and we are currently facing out of memory due to maxing out the heap size
First, both Vector and ArrayList will throw an OutOfMemoryError if the program tries to grow the capacity beyond a set limit. You need to be sure that the OOME does not originate from the hugeCapacity method of the Vector class. If this is the case, then perhaps you could try a LinkedList.
Second, what is your current heap size? The default JVM heap size is rather small. The intent is to avoid pauses or choppy behavior from a full GC becoming apparent to the user of an applet. However, the heap size is also often far to small for a reasonably sophisticated application or a fairly dumb service. The -Xmx JVM arg could be used to increase the heap size.
As per Sun Java Implementation, during expansion, ArrayList grows to 3/2 it's initial capacity whereas for HashMap the expansion rate is double. What is reason behind this?
As per the implementation, for HashMap, the capacity should always be in the power of two. That may be a reason for HashMap's behavior. But in that case the question is, for HashMap why the capacity should always be in power of two?
The expensive part at increasing the capacity of an ArrayList is copying the content of the backing array a new (larger) one.
For the HashMap, it is creating a new backing array and putting all map entries in the new array. And, the higher the capacity, the lower the risk of collisions. This is more expensive and explains, why the expansion factor is higher. The reason for 1.5 vs. 2.0? I consider this as "best practise" or "good tradeoff".
for HashMap why the capacity should always be in power of two?
I can think of two reasons.
You can quickly determine the bucket a hashcode goes in to. You only need a bitwise AND and no expensive modulo. int bucket = hashcode & (size-1);
Let's say we have a grow factor of 1.7. If we start with a size 11, the next size would be 18, then 31. No problem. Right? But the hashcodes of Strings in Java, are calculated with a prime factor of 31. The bucket a string goes into,hashcode%31, is then determined only by the last character of the String. Bye bye O(1) if you store folders that all end in /. If you use a size of, for example, 3^n, the distribution will not get worse if you increase n. Going from size 3 to 9, every element in bucket 2, will now go to bucket 2,5 or 7, depending on the higher digit. It's like splitting each bucket in three pieces. So a size of integer growth factor would be preferred. (Off course this all depends on how you calculate hashcodes, but a arbitrary growth factor doesn't feel 'stable'.)
The way HashMap is designed/implemented its underlying number of buckets must be a power of 2 (even if you give it a different size, it makes it a power of 2), thus it grows by a factor of two each time. An ArrayList can be any size and it can be more conservative in how it grows.
The accepted answer is not actually giving exact response to the question, but comment from #user837703 to that answer is clearly explaining why HashMap grows by power of two.
I found this article, which explains it in detail http://coding-geek.com/how-does-a-hashmap-work-in-java/
Let me post fragment of it, which gives detailed answer to the question:
// the function that returns the index of the bucket from the rehashed hash
static int indexFor(int h, int length) {
return h & (length-1);
}
In order to work efficiently, the size of the inner array needs to be a power of 2, let’s see why.
Imagine the array size is 17, the mask value is going to be 16 (size -1). The binary representation of 16 is 0…010000, so for any hash value H the index generated with the bitwise formula “H AND 16” is going to be either 16 or 0. This means that the array of size 17 will only be used for 2 buckets: the one at index 0 and the one at index 16, not very efficient…
But, if you now take a size that is a power of 2 like 16, the bitwise index formula is “H AND 15”. The binary representation of 15 is 0…001111 so the index formula can output values from 0 to 15 and the array of size 16 is fully used. For example:
if H = 952 , its binary representation is 0..01110111000, the associated index is 0…01000 = 8
if H = 1576 its binary representation is 0..011000101000, the associated index is 0…01000 = 8
if H = 12356146, its binary representation is 0..0101111001000101000110010, the associated index is 0…00010 = 2
if H = 59843, its binary representation is 0..01110100111000011, the associated index is 0…00011 = 3
This is why the array size is a power of two. This mechanism is transparent for the developer: if he chooses a HashMap with a size of 37, the Map will automatically choose the next power of 2 after 37 (64) for the size of its inner array.
Hashing takes advantage of distributing data evenly into buckets. The algorithm tries to prevent multiple entries in the buckets ("hash collisions"), as they will decrease performance.
Now when the capacity of a HashMap is reached, size is extended and existing data is re-distributed with the new buckets. If the size-increas would be too small, this re-allocation of space and re-dsitribution would happen too often.
A general rule to avoid collisions on Maps is to keep to load factor max at around 0.75
To decrease possibility of collisions and avoid expensive copying process HashMap grows at a larger rate.
Also as #Peter says, it must be a power of 2.
I can't give you a reason why this is so (you'd have to ask Sun developers), but to see how this happens take a look at source:
HashMap: Take a look at how HashMap resizes to new size (source line 799)
resize(2 * table.length);
ArrayList: source, line 183:
int newCapacity = (oldCapacity * 3)/2 + 1;
Update: I mistakenly linked to sources of Apache Harmony JDK - changed it to Sun's JDK.
Java's ArrayList dynamically expands itself when it needs to. How many elements does it add when the expansion happens?
And does it copy the old array into the new one, or does it somehow link the two together?
Have a look at the source code:
int newCapacity = (oldCapacity * 3)/2 + 1;
The exact factor differs by implementation, gnu uses a factor of 2. It doesn't matter much, it's just trading memory for speed.
It copies all the elements into a new array.
It creates a new array of double some multiple of the size, and copies the elements over. (I'm not sure if the actual multiplier is specified per the Java standard.)
Now the natural question is, why? Why not just add, say, five elements every time?
It's to make things faster: You add n elements for free, and on element n + 1, you have to copy over the n previous elements into the array of size 2n. So the cost of copying those n elements gets distributed ("amortized") over themselves (since you previously added them for free), and so on average, the cost of adding each element was n/n, or about 1 operation per element.
(See this link for some more discussion on this topic.)
Strictly speaking, the exact resizing behavior is not specified in the spec/JavaDoc:
The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.
This implies that the internal array can't be resized by adding a constant number, but that some multiplication has to be involved. As maartinus has pointed out the Sun JDK and OpenJDK multiply the size by 1.5 (roughly).