Use of ArrayList#trimToSize() method?

Use of ArrayList#trimToSize() method? - java

From a recently posted question I came across ArrayList#trimToSize() which reduces the size of the backing array to current size of collection.
Quoting javadoc
Trims the capacity of this ArrayList instance to be the list's current
size. An application can use this operation to minimize the storage of
an ArrayList instance.
And the Javadoc says that the application can use to reduce the memory footprint of backing array. If I am not wrong this method won't be useful for small sizes as the cost of some references won't hurt that much.
But because of the algorithm used by arraylist int newCapacity = (oldCapacity * 3)/2 + 1; in 1.6 and int newCapacity = oldCapacity + (oldCapacity >> 1); in 1.7, while adding new element if oldcapacity is large then it will create a new backing array with above algorithm and may allocate much unneeded space, if only one element is added after dynamic expansion.
Is my reasoning behind the method correct or there are some other applications to it?

Yes the backing array is increased by ~50% when it's full. For example, the program below adds 1 million entries, calls trimToSize then adds one entry. The backing array's length is 1.2m after adding the entries, 1m after trimming and 1.5m after adding one item.
So unless you know that you won't be adding to the list any longer, calling trimToSize could be counter-productive.
ArrayList<Integer> list = new ArrayList<>();
Field e = list.getClass().getDeclaredField("elementData");
e.setAccessible(true);
for (int i = 0; i < 1_000_000; i++) {
list.add(i);
}
System.out.println(((Object[]) e.get(list)).length); //1215487
list.trimToSize();
System.out.println(((Object[]) e.get(list)).length); //1000000
list.add(0);
System.out.println(((Object[]) e.get(list)).length); //1500000

Another situation is when we add elements and then remove many of them. When we remove elements internal aray stays unchanged.

Your formula results in worst-case overhead of just 50% of empty slots. Note that the minimum size of an Object is 24 bytes, compared to just 4 bytes for a compressed OOP in the array. The overhead amounts to just
(0.5*4) / (24+4) == 1/14 == 7%
which can hardly ever be worth considering—and that's the worst it can get. On average it's half the overhead in the array entries, and often the objects are much larger.
So the only time it would make sense to call trimToSize is after a massive removal from a previously huge arraylist. In other words, almost never.

Related

Question(s) about time complexity of array "resizing" in Java

NOTE: As the title already hints, this question is not about the specific java.util.ArrayList implementation of an array-based list, but rather about the raw arrays themselves and how they might behave in a "pure" (meaning completely unoptimized) array-based list implementation. I chose to mention java.util.ArrayList because it is the most prominent example of an array-based list in Java, although it is technically not "pure", as it utilizes preallocation to reduce the operation time of add(). If you want to know why I am asking this specific question without being interested in the java.util.ArrayList() preallocation optimization, I added a little explanation of my use case below.
It is generally known that you can access elements in array-based lists (like Java's ArrayList<E>) with a time complexity of O(1), while adding elements to that list will take O(n). With linked lists, it is the other way round (for a doubly linked list, you could optimize the access to half the execution time).
The reason why adding elements to an array-based list takes O(n) is that an array cannot simply be resized, but has to be reallocated and re-filled. The easiest way to do this would be:
String arr[] = new String[n];
//...
String newElem = "foo";
String[] newArr = new String[n + 1];
int i = 0;
for (String elem : arr) {
newArr[i] = arr[i++];
}
newArr[i] = newElem;
arr = newArr;
The time complexity O(n) is clearly visible thanks to the for loop. But there are other ways to copy arrays in Java, for example System.arraycopy().
Sticking to the vanilla for loop solution, even shrinking an array will take O(n), because an array has a fixed size and in order to "shrink" it, you'd have to copy all elements to be retained to a new, smaller array.
So, here are my questions concerning such array operations and their time complexity:
While the vanilla for loop will always take O(n), is it possible that System.arraycopy() optimizes the "add" operation if there is enough space in the memory to expand the array in place, meaning that it would leave the original array at its place and just add the new element at the end of it?
As the shrinking operation could always be executed with O(1) in theory, does System.arraycopy() always optimize this operation to O(1)?
If System.arraycopy() is not capable of using those optimizations, is there any other way in Java to actually utilize those optimizations which are possible in theory OR will array "resizing" always take O(n), no matter under which circumstances?
TL;DR is there any situation in which the "resizing" of an array in Java will take less than O(n)?
Additional information:
I am using openJDK11 (newest release), but if the answer turns out to be JVM-dependent, I'd like to know how other JVMs would behave in comparison.
For the curious ones
who want to know what I want to do with this information:
I am working on a new java.util.List implementation, namely a hybrid list that can store data in an array and in a linked buffer. On certain occasions, the buffer will be flushed into the array, which of course requires that the existing array is resized. But apart from this idea, I want to utilize as many other optimizations on the array part as possible. To avoid array resizing in general, I experimented with the idea of letting the array persist in a constant size, but managing the "valid" range of it with some other fields. Meaning that if you were to pop the last element of the array, it would not shrink the array but rather the range of valid elements. Then, when inserting new elements in the array part, the former invalid section can be used to shift values into, basically reusing the space that was formerly used by a now deleted element. If the inserting operations exceed the actual array size, elements can still be transferred to the linked buffer to avoid resizing. To further optimize this, I chose to use the middle of the array as a pivot when deleting certain elements. Now the valid range might not start at the beginning of the array anymore. Basically this means if you delete an element to the left of the pivot, all elements between the start of the valid range and the deleted element get shifted towards the pivot, to the right. Removing element to the right of the pivot works accordingly. So, after some removals, the array could look like this:
[null null|elem0 elem1 elem2||elem3 elem4 elem5|null null null]
(Where the | at the beginning and at the end mark the valid range and the || marks the pivot)
So, how is this all related to my question?
All of those optimizations build up upon the claim that array resizing is expensive in time, namely O(n). Therefore array resizing is avoided whenever possible. Those optimizations might sound neat, but the code implementing them can get quite messy, especially when implementing the batch operations (addAll(), removeAll(), retainAll()...). So, if it turns out that the array resizing operation itself can be less expensive in some cases (especially shrinking), I would cut out a lot of those optimizations which are then rendered useless, making the code a lot easier in the process.
So, before sticking to my optimization ideas and experiments, I'd like to know whether they are even needed.

Why is my algorithm O(1) additional space complexity?

I solved this problem from codefights:
Note: Write a solution with O(n) time complexity and O(1) additional space complexity, since this is what you would be asked to do during a real interview.
Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.
int firstDuplicate(int[] a) {
HashSet z = new HashSet();
for (int i: a) {
if (z.contains(i)){
return i;
}
z.add(i);
}
return -1;
}
My solution passed all of the tests. However I don't understand how my solution met the O(1) additional space complexity requirement. The size of the hashtable is directly proportional to the input so I would think it is O(n) space complexity. Did codefights incorrectly test my algorithm or am I misunderstanding something?

Your code doesn’t have O(1) auxiliary space complexity, since that hash set can grow up to size n if given an array of all different elements.
My guess is that the online testing infrastructure didn’t check memory usage or otherwise checked memory usage incorrectly. If you want to meet the space constraints, you’ll need to go back and try solving the problem a different way.
As a hint, think about reordering the array elements.

In case you are able to modify incomming array, you could fix your problem with O(n) time complexity, and do not use external memory.
public static int getFirstDuplicate(int... arr) {
for (int i = 0; i < arr.length; i++) {
int val = Math.abs(arr[i]);
if (arr[val - 1] < 0)
return val;
arr[val - 1] = -arr[val - 1];
}
return -1;
}

This is technically incorrect, for two reasons.
Firstly, depending on the values in the array, there may be overhead when the ints become Integers and added to the HashSet.
Secondly, while the additional memory is largely the overhead associated with a HashSet, that overhead is linearly proportional to the size of the set. (Note that I am not counting the elements in this, as they are already present in the array.)
Usually, these memory constraints are tested by setting a limit to the amount of memory it can use. A solution like this I would expect to fall below the said threshold.

What is initial size of ArrayList in java [duplicate]

As I recall, before Java 8, the default capacity of ArrayList was 10.
Surprisingly, the comment on the default (void) constructor still says: Constructs an empty list with an initial capacity of ten.
From ArrayList.java:
/**
* Shared empty array instance used for default sized empty instances. We
* distinguish this from EMPTY_ELEMENTDATA to know how much to inflate when
* first element is added.
*/
private static final Object[] DEFAULTCAPACITY_EMPTY_ELEMENTDATA = {};
...
/**
* Constructs an empty list with an initial capacity of ten.
*/
public ArrayList() {
this.elementData = DEFAULTCAPACITY_EMPTY_ELEMENTDATA;
}

Technically, it's 10, not zero, if you admit for a lazy initialisation of the backing array. See:
public boolean add(E e) {
ensureCapacityInternal(size + 1);
elementData[size++] = e;
return true;
}
private void ensureCapacityInternal(int minCapacity) {
if (elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA) {
minCapacity = Math.max(DEFAULT_CAPACITY, minCapacity);
}
ensureExplicitCapacity(minCapacity);
}
where
/**
* Default initial capacity.
*/
private static final int DEFAULT_CAPACITY = 10;
What you're referring to is just the zero-sized initial array object that is shared among all initially empty ArrayList objects. I.e. the capacity of 10 is guaranteed lazily, an optimisation that is present also in Java 7.
Admittedly, the constructor contract is not entirely accurate. Perhaps this is the source of confusion here.
Background
Here's an E-Mail by Mike Duigou
I have posted an updated version of the empty ArrayList and HashMap patch.
http://cr.openjdk.java.net/~mduigou/JDK-7143928/1/webrev/
This revised implementation introduces no new fields to either class. For ArrayList the lazy allocation of the backing array occurs only if the list is created at default size. According to our performance analysis team, approximately 85% of ArrayList instances are created at default size so this optimization will be valid for an overwhelming majority of cases.
For HashMap, creative use is made of the threshold field to track the requested initial size until the bucket array is needed. On the read side the empty map case is tested with isEmpty(). On the write size a comparison of (table == EMPTY_TABLE) is used to detect the need to inflate the bucket array. In readObject there's a little more work to try to choose an efficient initial capacity.
From: http://mail.openjdk.java.net/pipermail/core-libs-dev/2013-April/015585.html

In java 8 default capacity of ArrayList is 0 until we add at least one object into the ArrayList object (You can call it lazy initialization).
Now question is why this change has been done in JAVA 8?
Answer is to save memory consumption. Millions of array list objects are created in real time java applications. Default size of 10 objects means that we allocate 10 pointers (40 or 80 bytes) for underlying array at creation and fill them in with nulls.
An empty array (filled with nulls) occupy lot of memory .
Lazy initialization postpones this memory consumption till moment you will actually use the array list.
Please see below code for help.
ArrayList al = new ArrayList(); //Size: 0, Capacity: 0
ArrayList al = new ArrayList(5); //Size: 0, Capacity: 5
ArrayList al = new ArrayList(new ArrayList(5)); //Size: 0, Capacity: 0
al.add( "shailesh" ); //Size: 1, Capacity: 10
public static void main( String[] args )
throws Exception
{
ArrayList al = new ArrayList();
getCapacity( al );
al.add( "shailesh" );
getCapacity( al );
}
static void getCapacity( ArrayList<?> l )
throws Exception
{
Field dataField = ArrayList.class.getDeclaredField( "elementData" );
dataField.setAccessible( true );
System.out.format( "Size: %2d, Capacity: %2d%n", l.size(), ( (Object[]) dataField.get( l ) ).length );
}
Response: -
Size: 0, Capacity: 0
Size: 1, Capacity: 10
Article Default capacity of ArrayList in Java 8 explains it in details.

If the very first operation that is done with an ArrayList is to pass addAll a collection which has more than ten elements, then any effort put into creating an initial ten-element array to hold the ArrayList's contents would be thrown out the window. Whenever something is added to an ArrayList it's necessary to test whether the size of the resulting list will exceed the size of the backing store; allowing the initial backing store to have size zero rather than ten will cause this test to fail one extra time in the lifetime of a list whose first operation is an "add" which would require creating the initial ten-item array, but that cost is less than the cost of creating a ten-item array that never ends up getting used.
That having been said, it might have been possible to improve performance further in some contexts if there were a overload of "addAll" which specified how many items (if any) would likely be added to the list after the present one, and which could use that to influence its allocation behavior. In some cases code which adds the last few items to a list will have a pretty good idea that the list is never going to need any space beyond that. There are many situations where a list will get populated once and never modified after that. If at the point code knows that the ultimate size of a list will be 170 elements, it has 150 elements and a backing store of size 160, growing the backing store to size 320 will be unhelpful and leaving it at size 320 or trimming it to 170 will be less efficient than simply having the next allocation grow it to 170.

The question is 'why?'.
Memory profiling inspections (for example (https://www.yourkit.com/docs/java/help/inspections_mem.jsp#sparse_arrays) shows that empty (filled with nulls) arrays occupy tons of memory .
Default size of 10 objects means that we allocate 10 pointers (40 or 80 bytes) for underlying array at creation and fill them in with nulls. Real java applications create millions of array lists.
The introduced modification removes^W postpone this memory consumption till moment you will actually use the array list.

After above question I gone through ArrayList Document of Java 8. I found the default size is still 10 only.

ArrayList default size in JAVA 8 is stil 10. The only change made in JAVA 8 is that if a coder adds elements less than 10 then the remaining arraylist blank places are not specified to null. Saying so because I have myself gone through this situation and eclipse made me look into this change of JAVA 8.
You can justify this change by looking at below screenshot. In it you can see that ArrayList size is specified as 10 in Object[10] but the number of elements displayed are only 7. Rest null value elements are not displayed here. In JAVA 7 below screenshot is same with just a single change which is that the null value elements are also displayed for which the coder needs to write code for handling null values if he is iterating complete array list while in JAVA 8 this burden is removed from the head of coder/developer.
Screen shot link.

How to define the concept of capacity in ArrayLists?

I understand that capacity is the number of elements or available spaces in an ArrayList that may or may not hold a value referencing an object. I am trying to understand more about the concept of capacity.
So I have three questions:
1) What are some good ways to define what capacity represents from a memory standpoint?
...the (contiguous?) memory allocated to the ArrayList?
...the ArrayLists’s memory footprint on the (heap?)?
2) Then if the above is true, changing capacity requires some manner of memory management overhead?
3) Anyone have an example where #2 was or could be a performance concern? Aside from maybe a large number of large ArrayLists having their capacities continually adjusted?

The class is called ArrayList because it's based on an array. The capacity is the size of the array, which requires a block of contiguous heap memory. However, note that the array itself contains only references to the elements, which are separate objects on the heap.
Increasing the capacity requires allocating a new, larger array and copying all the references from the old array to the new one, after which the old one becomes eligible for garbage collection.
You've cited the main case where performance could be a concern. In practice, I've never seen it actually become a problem, since the element objects usually take up much more memory (and possibly CPU time) than the list.

ArrayList is implemented like this:
class ArrayList {
private Object[] elements;
}
the capacity is the size of that array.
Now, if your capacity is 10, and you're adding 11-th element, ArrayList will do this:
Object[] newElements = new Object[capacity * 1.5];
System.arraycopy(this.elements, newElements);
this.elements = newElements;
So if you start off with a small capacity, ArrayList will end up creating a bunch of arrays and copying stuff around for you as you keep adding elements, which isn't good.
On the other hand, if you specify a capacity of 1,000,000 and add only 3 elements to ArrayList, that also is kinda bad.
Rule of thumb: if you know the capacity, specify it. If you aren't sure but know the upper bound, specify that. If you just aren't sure, use the defaults.

Capacity is as you described it -- the contiguous memory allocated to an ArrayList for storage of values. ArrayList stores all values in an array, and automatically resizes the array for you. This incurs memory management overhead when resizing.
If I remember correctly, Java increases the size of an ArrayList's backing array from size N to size 2N + 2 when you try to add one more element than the capacity can take. I do not know what size it increases to when you use the insert method (or similar) to insert at a specific position beyond the end of the capacity, or even whether it allows this.
Here is an example to help you think about how it works. Picture each space between the |s as a cell in the backing array:
| | |
size = 0 (contains no elements), capacity = 2 (can contain 2 elements).
|1| |
size = 1 (contains 1 element), capacity = 2 (can contain 2 elements).
|1|2|
size = 2, capacity = 2. Adding another element:
|1|2|3| | | |
size increased by 1, capacity increased to 6 (2 * 2 + 2). This can be expensive with large arrays, as allocating a large contiguous memory region can require a bit of work (as opposed to a LinkedList, which allocates many small pieces of memory) because the JVM needs to search for an appropriate location, and may need to ask the OS for more memory. It is also expensive to copy a large number of values from one place to another, which would be done once such a region was found.
My rule of thumb is this: If you know the capacity you will require, use an ArrayList because there will only be one allocation and access is very fast. If you do not know your required capacity, use a LinkedList because adding a new value always takes the same amount of work, and there is no copying involved.

1) What are some good ways to define what capacity represents from a memory standpoint?
...the (contiguous?) memory allocated to the ArrayList?
Yes, an ArrayList is backed up by an array, to that represents the internal array size.
...the ArrayLists’s memory footprint on the (heap?)?
Yes, the larget the array capacity, the more footprint used by the arraylist.
2) Then if the above is true, changing capacity requires some manner of memory management overhead?
It is. When the list grows large enough, a larger array is allocated and the contents copied. The previous array maybe discarded and marked for garbage collection.
3) Anyone have an example where #2 was or could be a performance concern? Aside from maybe a large number of large ArrayLists having their capacities continually adjusted?
Yes, if you create the ArrayList with initial capacity of 1 ( for instance ) and your list grows way beyond that. If you know upfront the number of elements to store, you better request an initial capacity of that size.
However I think this should be low in your list of priorities, while array copy may happen very often, it is optimized since the early stages of Java, and should not be a concern. Better would be to choose a right algorithm, I think. Remember: Premature optimization is the root of all evil
See also: When to use LinkedList over ArrayList

Memory overhead of Java HashMap compared to ArrayList

I am wondering what is the memory overhead of java HashMap compared to ArrayList?
Update:
I would like to improve the speed for searching for specific values of a big pack (6 Millions+) of identical objects.
Thus, I am thinking about using one or several HashMap instead of using ArrayList. But I am wondering what is the overhead of HashMap.
As far as i understand, the key is not stored, only the hash of the key, so it should be something like size of the hash of the object + one pointer.
But what hash function is used? Is it the one offered by Object or another one?

If you're comparing HashMap with ArrayList, I presume you're doing some sort of searching/indexing of the ArrayList, such as binary search or custom hash table...? Because a .get(key) thru 6 million entries would be infeasible using a linear search.
Using that assumption, I've done some empirical tests and come up with the conclusion that "You can store 2.5 times as many small objects in the same amount of RAM if you use ArrayList with binary search or custom hash map implementation, versus HashMap". My test was based on small objects containing only 3 fields, of which one is the key, and the key is an integer. I used a 32bit jdk 1.6. See below for caveats on this figure of "2.5".
The key things to note are:
(a) it's not the space required for references or "load factor" that kills you, but rather the overhead required for object creation. If the key is a primitive type, or a combination of 2 or more primitive or reference values, then each key will require its own object, which carries an overhead of 8 bytes.
(b) In my experience you usually need the key as part of the value, (e.g. to store customer records, indexed by customer id, you still want the customer id as part of the Customer object). This means it is IMO somewhat wasteful that a HashMap separately stores references to keys and values.
Caveats:
The most common type used for HashMap keys is String. The object creation overhead doesn't apply here so the difference would be less.
I got a figure of 2.8, being 8880502 entries inserted into the ArrayList compared with 3148004 into the HashMap on -Xmx256M JVM, but my ArrayList load factor was 80% and my objects were quite small - 12 bytes plus 8 byte object overhead.
My figure, and my implementation, requires that the key is contained within the value, otherwise I'd have the same problem with object creation overhead and it would be just another implementation of HashMap.
My code:
public class Payload {
int key,b,c;
Payload(int _key) { key = _key; }
}
import org.junit.Test;
import java.util.HashMap;
import java.util.Map;
public class Overhead {
#Test
public void useHashMap()
{
int i=0;
try {
Map<Integer, Payload> map = new HashMap<Integer, Payload>();
for (i=0; i < 4000000; i++) {
int key = (int)(Math.random() * Integer.MAX_VALUE);
map.put(key, new Payload(key));
}
}
catch (OutOfMemoryError e) {
System.out.println("Got up to: " + i);
}
}
#Test
public void useArrayList()
{
int i=0;
try {
ArrayListMap map = new ArrayListMap();
for (i=0; i < 9000000; i++) {
int key = (int)(Math.random() * Integer.MAX_VALUE);
map.put(key, new Payload(key));
}
}
catch (OutOfMemoryError e) {
System.out.println("Got up to: " + i);
}
}
}
import java.util.ArrayList;
public class ArrayListMap {
private ArrayList<Payload> map = new ArrayList<Payload>();
private int[] primes = new int[128];
static boolean isPrime(int n)
{
for (int i=(int)Math.sqrt(n); i >= 2; i--) {
if (n % i == 0)
return false;
}
return true;
}
ArrayListMap()
{
for (int i=0; i < 11000000; i++) // this is clumsy, I admit
map.add(null);
int n=31;
for (int i=0; i < 128; i++) {
while (! isPrime(n))
n+=2;
primes[i] = n;
n += 2;
}
System.out.println("Capacity = " + map.size());
}
public void put(int key, Payload value)
{
int hash = key % map.size();
int hash2 = primes[key % primes.length];
if (hash < 0)
hash += map.size();
do {
if (map.get(hash) == null) {
map.set(hash, value);
return;
}
hash += hash2;
if (hash >= map.size())
hash -= map.size();
} while (true);
}
public Payload get(int key)
{
int hash = key % map.size();
int hash2 = primes[key % primes.length];
if (hash < 0)
hash += map.size();
do {
Payload payload = map.get(hash);
if (payload == null)
return null;
if (payload.key == key)
return payload;
hash += hash2;
if (hash >= map.size())
hash -= map.size();
} while (true);
}
}

The simplest thing would be to look at the source and work it out that way. However, you're really comparing apples and oranges - lists and maps are conceptually quite distinct. It's rare that you would choose between them on the basis of memory usage.
What's the background behind this question?

All that is stored in either is pointers. Depending on your architecture a pointer should be 32 or 64 bits (or more or less)
An array list of 10 tends to allocate 10 "Pointers" at a minimum (and also some one-time overhead stuff).
A map has to allocate twice that (20 pointers) because it stores two values at a time. Then on top of that, it has to store the "Hash". which should be bigger than the map, at a loading of 75% it SHOULD be around 13 32-bit values (hashes).
so if you want an offhand answer, the ratio should be about 1:3.25 or so, but you are only talking pointer storage--very small unless you are storing a massive number of objects--and if so, the utility of being able to reference instantly (HashMap) vs iterate (array) should be MUCH more significant than the memory size.
Oh, also:
Arrays can be fit to the exact size of your collection. HashMaps can as well if you specify the size, but if it "Grows" beyond that size, it will re-allocate a larger array and not use some of it, so there can be a little waste there as well.

I don't have an answer for you either, but a quick google search turned up a function in Java that might help.
Runtime.getRuntime().freeMemory();
So I propose that you populate a HashMap and an ArrayList with the same data. Record the free memory, delete the first object, record memory, delete the second object, record the memory, compute the differences,..., profit!!!
You should probably do this with magnitudes of data. ie Start with 1000, then 10000, 100000, 1000000.
EDIT: Corrected, thanks to amischiefr.
EDIT:
Sorry for editing your post, but this is pretty important if you are going to use this (and It's a little much for a comment)
.
freeMemory does not work like you think it would. First, it's value is changed by garbage collection. Secondly, it's value is changed when java allocates more memory. Just using the freeMemory call alone doesn't provide useful data.
Try this:
public static void displayMemory() {
Runtime r=Runtime.getRuntime();
r.gc();
r.gc(); // YES, you NEED 2!
System.out.println("Memory Used="+(r.totalMemory()-r.freeMemory()));
}
Or you can return the memory used and store it, then compare it to a later value. Either way, remember the 2 gcs and subtracting from totalMemory().
Again, sorry to edit your post!

Hashmaps try to maintain a load factor (usually 75% full), you can think of a hashmap as a sparsely filled array list. The problem in a straight up comparison in size is this load factor of the map grows to meet the size of the data. ArrayList on the other hand grows to meet it's need by doubling it's internal array size. For relatively small sizes they are comparable, however as you pack more and more data into the map it requires a lot of empty references in order to maintain the hash performance.
In either case I recommend priming the expected size of the data before you start adding. This will give the implementations a better initial setting and will likely consume less over all in both cases.
Update:
based on your updated problem check out Glazed lists. This is a neat little tool written by some of the Google people for doing operations similar to the one you describe. It's also very quick. Allows clustering, filtering, searching, etc.

HashMap hold a reference to the value and a reference to the key.
ArrayList just hold a reference to the value.
So, assuming that the key uses the same memory of the value, HashMap uses 50% more memory ( although strictly speaking , is not the HashMap who uses that memory because it just keep a reference to it )
In the other hand HashMap provides constant-time performance for the basic operations (get and put) So, although it may use more memory, getting an element may be much faster using a HashMap than a ArrayList.
So, the next thing you should do is not to care about who uses more memory but what are they good for.
Using the correct data structure for your program saves more CPU/memory than how the library is implemented underneath.
EDIT
After Grant Welch answer I decided to measure for 2,000,000 integers.
Here's the source code
This is the output
$
$javac MemoryUsage.java
Note: MemoryUsage.java uses unchecked or unsafe operations.
Note: Recompile with -Xlint:unchecked for details.
$java -Xms128m -Xmx128m MemoryUsage
Using ArrayListMemoryUsage#8558d2 size: 0
Total memory: 133.234.688
Initial free: 132.718.608
Final free: 77.965.488
Used: 54.753.120
Memory Used 41.364.824
ArrayListMemoryUsage#8558d2 size: 2000000
$
$java -Xms128m -Xmx128m MemoryUsage H
Using HashMapMemoryUsage#8558d2 size: 0
Total memory: 133.234.688
Initial free: 124.329.984
Final free: 4.109.600
Used: 120.220.384
Memory Used 129.108.608
HashMapMemoryUsage#8558d2 size: 2000000

Basically, you should be using the "right tool for the job". Since there are different instances where you'll need a key/value pair (where you may use a HashMap) and different instances where you'll just need a list of values (where you may use a ArrayList) then the question of "which one uses more memory", in my opinion, is moot, since it is not a consideration of choosing one over the other.
But to answer the question, since HashMap stores key/value pairs while ArrayList stores just values, I would assume that the addition of keys alone to the HashMap would mean that it takes up more memory, assuming, of course, we are comparing them by the same value type (e.g. where the values in both are Strings).

I think the wrong question is being asked here.
If you would like to improve the speed at which you can search for an object in a List containing six million entries, then you should look into how fast these datatype's retrieval operations perform.
As usual, the Javadocs for these classes state pretty plainly what type of performance they offer:
HashMap:
This implementation provides constant-time performance for the basic operations (get and put), assuming the hash function disperses the elements properly among the buckets.
This means that HashMap.get(key) is O(1).
ArrayList:
The size, isEmpty, get, set, iterator, and listIterator operations run in constant time. The add operation runs in amortized constant time, that is, adding n elements requires O(n) time. All of the other operations run in linear time (roughly speaking).
This means that most of ArrayList's operations are O(1), but likely not the ones that you would be using to find objects that match a certain value.
If you are iterating over every element in the ArrayList and testing for equality, or using contains(), then this means that your operation is running at O(n) time (or worse).
If you are unfamiliar with O(1) or O(n) notation, this is referring to how long an operation will take. In this case, if you can get constant-time performance, you want to take it. If HashMap.get() is O(1) this means that retrieval operations take roughly the same amount of time regardless of how many entries are in the Map.
The fact that something like ArrayList.contains() is O(n) means that the amount of time it takes grows as the size of the list grows; so iterating thru an ArrayList with six million entries will not be very effective at all.

I don't know the exact number, but HashMaps are much heavier. Comparing the two, ArrayList's internal representation is self evident, but HashMaps retain Entry objects (Entry) which can balloon your memory consumption.
It's not that much larger, but it's larger. A great way to visualize this would be with a dynamic profiler such as YourKit which allows you to see all heap allocations. It's pretty nice.

This post is giving a lot of information about objects sizes in Java.

If you're considering two ArrayLists vs one Hashmap, it's indeterminate; both are partially-full data structures. If you were comparing Vector vs Hashtable, Vector is probably more memory efficient, because it only allocates the space it uses, whereas Hashtables allocate more space.
If you need a key-value pair and aren't doing incredibly memory-hungry work, just use the Hashmap.

As Jon Skeet noted, these are completely different structures. A map (such as HashMap) is a mapping from one value to another - i.e. you have a key that maps to a value, in a Key->Value kind of relationship. The key is hashed, and is placed in an array for quick lookup.
A List, on the other hand, is a collection of elements with order - ArrayList happens to use an array as the back end storage mechanism, but that is irrelevant. Each indexed element is a single element in the list.
edit: based on your comment, I have added the following information:
The key is stored in a hashmap. This is because a hash is not guaranteed to be unique for any two different elements. Thus, the key has to be stored in the case of hashing collisions. If you simply want to see if an element exists in a set of elements, use a Set (the standard implementation of this being HashSet). If the order matters, but you need a quick lookup, use a LinkedHashSet, as it keeps the order the elements were inserted. The lookup time is O(1) on both, but the insertion time is slightly longer on a LinkedHashSet. Use a Map only if you are actually mapping from one value to another - if you simply have a set of unique objects, use a Set, if you have ordered objects, use a List.

This site lists the memory consumption for several commonly (and not so commonly) used data structures. From there one can see that the HashMap takes roughly 5 times the space of an ArrayList. The map will also allocate one additional object per entry.
If you need a predictable iteration order and use a LinkedHashMap, the memory consumption will be even higher.
You can do your own memory measurements with Memory Measurer.
There are two important facts to note however:
A lot of data structures (including ArrayList and HashMap) do allocate space more space than they need currently, because otherwise they would have to frequently execute a costly resize operation. Thus the memory consumption per element depends on how many elements are in the collection. For example, an ArrayList with the default settings uses the same memory for 0 to 10 elements.
As others have said, the keys of the map are stored, too. So if they are not in memory anyway, you will have to add this memory cost, too. An additional object will usually take 8 bytes of overhead alone, plus the memory for its fields, and possibly some padding. So this will also be a lot of memory.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Use of ArrayList#trimToSize() method? - java

Another situation is when we add elements and then remove many of them. When we remove elements internal aray stays unchanged.

Related

Question(s) about time complexity of array "resizing" in Java

Why is my algorithm O(1) additional space complexity?

What is initial size of ArrayList in java [duplicate]

How to define the concept of capacity in ArrayLists?

Memory overhead of Java HashMap compared to ArrayList

Categories

Resources