ArrayList VS Vector JVM Memory usage - java

Does using an ArrayList take less memory when compared to Vector? I have read Vector doubles its internal array size when vector reaches the max size where as ArrayList does it only by half? Is this a true statement? I need the answer when I do not declare Vector with the values for initialcapacity and capacityIncrement.

Yes you are correct in terms of memory allocation of internal arrays:
Internally, both the ArrayList and Vector hold onto their contents using an Array. When an element is inserted into an ArrayList or a Vector, the object will need to expand its internal array if it runs out of room. A Vector defaults to doubling the size of its array, while the ArrayList increases its array size by 50 percent.
Correction
It is not always that these Vector will double up the capacity. It may just increase its size upto the increment mentioned in the constructor :
public Vector(int initialCapacity, int capacityIncrement)
The logic in grow method is to increase the capacity to double if increment not mentioned, otherwise use the capacityIncrement, here is the code of Vector grow method:
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
int newCapacity = oldCapacity + ((capacityIncrement > 0) ?
capacityIncrement : oldCapacity);
if (newCapacity - minCapacity < 0)
newCapacity = minCapacity;
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
elementData = Arrays.copyOf(elementData, newCapacity);
}

There is no comparison between Vector and ArrayList as they fit different purposes. Vector was supposed to be a concurrency safe List implementation. However, the design of the class was severely flawed and did not provide concurrency guarantees for the most common use case of iteration.
Vector itself is easily replaced with Collections.synchronizedList(new ArrayList()). The result of course contains the same flaw as Vector. Vector should be considered deprecated.
The use of Vector is a now a mark for naivety in understanding Java and concurrent programming. Don't use it.

To answer the original question:
ArrayList by default will grow the capacity by half of the current capacity. However, at any time, the program may call ensureCapacity to set the capacity to an appropriately large value.
Vector by default will grow the capacity by doubling. However, there is a constructor that allows setting the grow amount. Using a small grow value will have a negative impact on performance. Additionally, you could actually get less capacity since each grow requires a duplicate array to exist in memory for a short period of time.
In a comment, the OP has stated:
The application pulls huge data set and we are currently facing out of memory due to maxing out the heap size
First, both Vector and ArrayList will throw an OutOfMemoryError if the program tries to grow the capacity beyond a set limit. You need to be sure that the OOME does not originate from the hugeCapacity method of the Vector class. If this is the case, then perhaps you could try a LinkedList.
Second, what is your current heap size? The default JVM heap size is rather small. The intent is to avoid pauses or choppy behavior from a full GC becoming apparent to the user of an applet. However, the heap size is also often far to small for a reasonably sophisticated application or a fairly dumb service. The -Xmx JVM arg could be used to increase the heap size.

Related

Resizing of ArrayDeque

Quote: Default initial capacity of ArrayDeque is 16. It will increase at a power of 2 (24, 25, 26 and so on) when size exceeds capacity.
Does this mean it behaves similar to ArrayList? Each time size exceeds capacity there is new array where older elements are copied to? Can I say internal implementation of ArrayDequeue and ArrayList is array (as their name says)? Just the resizing differs?
Yes ArrayDeque behaves similarly to ArrayList: Internally it uses an Object array. If the capacity is not enough, it creates a new, larger array, and copies items from the old array to the new.
The Java API specification does not require any particular resizing behavior. In fact the current implementation in OpenJDK doubles the size of the array if it's small (64), otherwise it grows by 50%:
// Double capacity if small; else grow by 50%
int jump = (oldCapacity < 64) ? (oldCapacity + 2) : (oldCapacity >> 1);
It seems that the"doubling" behavior is approximate: thanks to the "+2" after the first resize the capacity is 16+16+2 = 34. After the second resize it's 34+34+2 = 70. After that the array increases by 50% in every resize.

Why the new capacity of arraylist is (oldCapacity * 3)/2 + 1?

Why does ensureCapacity() in Java ArrayList extend the capacity with a const 1.5 or (oldCapacity * 3)/2 + 1?
Resizing the array is a relatively expensive operation. It wants to try and make sure that if the method gets called with ensureCapacity(11), ensureCapacity(12), ensureCapacity(13), ... it should not have to resize the array every time. So it resizes by a reasonable chunk (increase by 50%) instead of the minimum specified.
The main reason lies the (asymptotic) complexity of adding a sequence of elements to the list.
Note that the add method internally calls ensureCapacity(size+1). When the size of the internal array is increased, all elements have to be copied into the new, larger array.
If the size was only increased by a constant amount (which would be 1 for each call to add), then adding n elements would have a complexity of O(n2).
Instead, the size is always increased by a constant factor. Then, adding n elements only has a complexity of O(n).

What does Java hashtablload factor == 1 mean?

I need a hashtable that doesn't change it's size, because at the beginning I know the size should be N, and the table shouldn't change in the program, so should I set the load factor to 1 to mean don't increase it's size until the size increases to N+1, which I know will never occur ?
To be more specific, I want this : when it reaches N, it shouldn't increase, but if N+1 occurs, then increase size. Is this the right way to set it ?
You probably want to use java.util.HashMap instead of Hashtable, unless you need the synchronized access. Either provides a constructor for you to set an initial capacity.
The load factor is the upper threshold multiplier for number of items before the table is rehashed.
The simplest answer is yes. However to explain a little bit..
The threshold for rehashing is calculated like this
threshold = (int)(initialCapacity * loadFactor);
And in the put method rehash is triggered by the following condition.
if (count >= threshold)
This is more or less true for HashMap as well. Should you decide to use it.

When an ArrayList resizes itself, how many elements does it add?

Java's ArrayList dynamically expands itself when it needs to. How many elements does it add when the expansion happens?
And does it copy the old array into the new one, or does it somehow link the two together?
Have a look at the source code:
int newCapacity = (oldCapacity * 3)/2 + 1;
The exact factor differs by implementation, gnu uses a factor of 2. It doesn't matter much, it's just trading memory for speed.
It copies all the elements into a new array.
It creates a new array of double some multiple of the size, and copies the elements over. (I'm not sure if the actual multiplier is specified per the Java standard.)
Now the natural question is, why? Why not just add, say, five elements every time?
It's to make things faster: You add n elements for free, and on element n + 1, you have to copy over the n previous elements into the array of size 2n. So the cost of copying those n elements gets distributed ("amortized") over themselves (since you previously added them for free), and so on average, the cost of adding each element was n/n, or about 1 operation per element.
(See this link for some more discussion on this topic.)
Strictly speaking, the exact resizing behavior is not specified in the spec/JavaDoc:
The details of the growth policy are not specified beyond the fact that adding an element has constant amortized time cost.
This implies that the internal array can't be resized by adding a constant number, but that some multiplication has to be involved. As maartinus has pointed out the Sun JDK and OpenJDK multiply the size by 1.5 (roughly).

Choosing the initial capacity of a HashSet with an expected number of unique values and insertions

Ok, here's my situation:
I have an Array of States, which may contain duplicates. To get rid of the duplicates, I can add them all to a Set.
However when I create the Set, it wants the initial capacity and load factor to be defined, but what should they be set to?
From googling, I have come up with:
String[] allStates = getAllStates();
Set<String> uniqueStates = new HashSet<String>(allStates.length, 0.75);
The problem with this, is that allStates can contain anwhere between 1 and 5000 states. So the Set will have capacity of over 5000, but only containing at most 50.
So alternatively set the max size of the Set could be set to be the max number of states, and the load factor to be 1.
I guess my questions really are:
What should you set the initial capacity to be when you don't know how many items are to be in the Set?
Does it really matter what it gets set to when the most it could contain is 50?
Should I even be worrying about it?
Assuming that you know there won't be more than 50 states (do you mean US States?), the
Set<String> uniqueStates = new HashSet<String>(allStates.length, 0.75);
quoted is definitely wrong. I'd suggest you go for an initial capacity of 50 / 0.75 = 67, or perhaps 68 to be on the safe side.
I also feel the need to point out that you're probably overthinking this intensely. Resizing the arraylist twice from 16 up to 64 isn't going to give you a noticeable performance hit unless this is right in the most performance-critical part of the program.
So the best answer is probably to use:
new HashSet<String>();
That way, you won't come back a year later and puzzle over why you chose such strange constructor arguments.
Use the constructor where you don't need to specify these values, then reasonable defaults are chosen.
First, I'm going to say that in your case you're definitely overthinking it. However, there are probably situations where one would want to get it right. So here's what I understand:
1) The number of items you can hold in your HashSet = initial capacity x load factor. So if you want to be able to hold n items, you need to do what Zarkonnen did and divide n by the load factor.
2) Under the covers, the initial capacity is rounded up to a power of two per Oracle tutorial.
3) Load factor should be no more than .80 to prevent excessive collisions, as noted by Tom Hawtin - tackline.
If you just accept the default values (initial capacity = 16, load factor = .75), you'll end up doubling your set in size 3 times. (Initial max size = 12, first increase makes capacity 32 and max size 24 (32 * .75), second increase makes capacity 64 and max size 48 (64 * .75), third increase makes capacity 128 and max size 96 (128 * .75).)
To get your max size closer to 50, but keep the set as small as possible, consider an initial capacity of 64 (a power of two) and a load factor of .79 or more. 64 * .79 = 50.56, so you can get all 50 states in there. Specifying 32 < initial capacity < 64 will result in initial capacity being rounded up to 64, so that's the same as specifying 64 up front. Specifying initial capacity <= 32 will result in a size increase. Using a load factor < .79 will also result in a size increase unless your initial capacity > 64.
So my recommendation is to specify initial capacity = 64 and load factor = .79.
The safe bet is go for a size that is too small.
Because resizing is ameliorated by an exponential growth algorithm (see the stackoverflow podcast from a few weeks back), going small will never cost you that much. If you have lots of sets (lucky you), then it will matter to performance if they are oversize.
Load factor is a tricky one. I suggest leaving it at the default. I understand: Below about 0.70f you are making the array too large and therefore slower. Above 0.80f and you'll start getting to many key clashes. Presumably probing algorithms will require lower load factors than bucket algorithms.
Also note that the "initial capacity" means something slightly different than it appears most people think. It refers to the number of entries in the array. To get the exact capacity for a number of elements, divide by the desired load factor (and round appropriately).
Make a good guess. There is no hard rule. If you know there's likely to be say 10-20 states, i'd start off with that number (20).
I second Zarkonnen. Your last question is the most important one. If this happens to occur in a hotspot of your application it might be worth the effort to look at it and try to optimise, otherwise CPU cycles are cheaper than burning up your own neurons.
If you were to optimize this -- and it may be appropriate to do that -- some of your decision will depend on how many duplicates you expect the array to have.
If there are very many duplicates, you will want a smaller initial
capacity. Large, sparse hash tables are bad when iterating.
If there are not expected to be very many duplicates, you will want
an initial capacity such that the entire array could fit without
resizing.
My guess is that you want the latter, but this is something worth considering if you pursue this.

Categories

Resources