How ArrayList provides random access behaviour? - java

ArrayList is simply implemented as an Object[]. I know it implements the RandomAccess interface, but it is only a marker interface...
So, my question is: Why/How ArrayList provides the random access feature?
EDIT 1: perhaps I should make this clearer...what I want to understand is why it is constant time to access the element while it is an Object[]?

By comparing a LinkedList, an ArrayList and an Array visually should makes things easy:
Linked list:
+----+ +----+ +----+ +----+
|Head| ---> | e1 | ---> | e2 | ---> | e3 | ---> null
+----+ +----+ +----+ +----+
Now, let say I want to get element e2, however the linkedlist itself holds the reference of the headNode. To get to e2, I have to traverse all the way to e2 from the HeadNode. Clearly, this does not provides constant time operation as you can't access any of the elements directly without traversing through the list.
Array:
+----++----++----++----+
| e1 || e2 || e3 || e4 | (value)
+----++----++----++----+
| 01 || 02 || 03 || 04 | (address)
+----++----++----++----+
Imagine this, when you have a variable holding an array, only the address of the first element (e1) is held in the variable. The following array elements will be stored in the next available memory block. The array elements sit next to each other in a sequential sequence in memory. This makes it a constant time operation when you need to access a specific element. For example, when you want to access e3 and each memory block is 4 bytes. From the first element, move 2 blocks of memory (8 bytes) from the array reference. The key to constant time operation is: No traversing needed. It just has to calculate how many bytes to shift from current location according to size of each block and number of blocks to move (indicated by array index). In Java, when you try to shift beyond the bounds of the allocated memory for the array, it gives you an ArrayIndexOutOfBoundsException.
ArrayList:
Arraylist uses the same idea of an array. It will allocate a size of 10 initially. When it needs to grow (more elements added for instance), it creates a new array with added length for storage. Since the storage of the data is by array, the operation time will be same as array (i.e. constant time).

Elements of an ArrayList can be accessed randomly, i.e. you are free to choose an index at any time to get an element that is in the list:
myList.get(3);
myList.get(1);
myList.get(5);
The ArrayList's method to get() an element is implemented as:
public E get(int index) {
rangeCheck(index);
return elementData(index);
}
#SuppressWarnings("unchecked")
E elementData(int index) {
return (E) elementData[index];
}
To access a single element in an array by index, you need constant time. No matter how many elements are in the (array-based) list, you can always get your entry in the same time. It does not matter whether the element is at the beginning, in the middle or at the end of your list.
The opposite is sequential access, e.g. used in a LinkedList, where each element contains a reference to the next item in list. You cannot access elements randomly here, but you have to iterate through all prior items to reach your target elements:
public E get(int index) {
checkElementIndex(index);
return node(index).item;
}
Node<E> node(int index) {
// assert isElementIndex(index);
if (index < (size >> 1)) {
Node<E> x = first;
for (int i = 0; i < index; i++)
x = x.next;
return x;
} else {
Node<E> x = last;
for (int i = size - 1; i > index; i--)
x = x.prev;
return x;
}
}
Here your access cost are dependent on the length of the list and the number of items in the list. Elements at the beginning of the list can be accessed faster, since the path to reach them is shorter. To access an element in the middle or at the end of the list is more costly, since you have to traverse all the other elements sequentially.

Why
Because that's one of the reasons you use an ArrayList, because you want constant-time access to the elements. So ArrayList has the marker RandomAccess to tell you that that's what it provides. If you didn't need that, you might use a LinkedList instead, which doesn't provide constant-time access but doesn't have to do the occasional big reallocations ArrayList has to do.
How
By using an array under the covers. Arrays provide constant-time access, so...

As you said, RandomAccess is purely a marker interface. By adding it to a collection class, one indicates (basically) that get(int) is implemented in constant time.
ArrayList does that because it litterarly is one memory access to get a value from a specific position.

An array is by definition a random-access data structure. An array has fixed offsets for each elements. If I wanted to access the n-th element of an array, I could simply access the memory address <base offset> + n * <element size>. There is no iteration required. An ArrayList is a List-implementation that is internally backed by an array such that it inherits this property. Whenever the backing array of an ArrayList cannot fit its elements any more, the ArrayList copies all elements into a new array. This is why a LinkedList can be more efficient when collecting an unknown number of elements. For this advantage, one cannot access an element by computing a fixed index any more.
The RandomAccess marker interface signals random accessability to users of this list to allow for implementing efficient algorithms if random-access is required. A LinkedList, for example, exposes the same interface as an ArrayList. A sorting algorithm that requires random-access but does never change the size of a list would then rather copy the elements of a non-randomly-accessible list into a more efficient data structure while sorting. It can then rather iterate the non-randomly-accessible list in a single pass-through for filling the list with the sorted elements.

In computer science, random access (more precisely and more generally called direct access) is the ability to access an item of data at any given coordinates in a population of addressable. Wikipedia
The ArrayList provide API to access any element using method get(int).
The int is the index of item in Object[].
This method allow you to access item randomly (at will). The opposite is sequential access (LinkedList) that you must move through the items of structure.

Related

Take a specific Line from an Arraylist and work with it [duplicate]

I have an arrayList with 30 elements. I'd like to create many sublists of 15 elements from this list. What's the efficient way of doing so?
Right now I clone the ArrayList and use remove(random) to do it, but I am sure this is too clumsy. What should I do instead?
Does Java have a "sample" function like in R?
Clarification: by sampling with no replacement I mean take at random 15 unique elements from the 30 available in the original list. Moreover I want to be able to do this repeatedly.
Use the Collections#shuffle method to shuffle your original list, and return a list with the first 15 elements.
Consider creating new list and adding random elements from current list instead of copying all elements and removing them.
Another way to do this is to create some kind of View on top of the current list.
Implement an Iterator interface that randomly generates index of element during next operation and retrieves element by index from current list.
No, Java does not have a sample function like in R. However, it is possible to write such a function:
// Samples n elements from original, and returns that list
public <T> static List<T> sample(List<T> original, int n) {
List<T> result = new ArrayList<T>(n);
for (int i = 0; i < original.size(); i++) {
if (result.size() == n)
return result;
if ((n - result.size()) >= (original.size() - i)) {
result.add(original.get(i));
} else if (Math.random() < ((double)n / original.size())) {
result.add(original.get(i));
}
}
return result;
}
This function iterates through original, and copies the current element to result based on a random number, unless we are near enough to the end of original to require copying all the remaining elements (the second if statement in the loop).
This is a basic combinatorics problem. You have 30 elements in your list, and you want to choose 15. If the order matters, you want a permutation, if it doesn't matter, you want a combination.
There are various Java combinatorics samples on the web, and they typically use combinadics. I don't know of any ready made Java libraries, but Apache Math Commons has binomial coefficient support to help you implement combinadics if you go that route. Once you have a sequence of 15 indices from 0 to 29, I'd suggest creating a read-only iterator that you can read the elements from. That way you won't have to create any new lists or copy any references.

What would be the practial usage of vectors? [duplicate]

I've always been one to simply use:
List<String> names = new ArrayList<>();
I use the interface as the type name for portability, so that when I ask questions such as this, I can rework my code.
When should LinkedList be used over ArrayList and vice-versa?
Summary ArrayList with ArrayDeque are preferable in many more use-cases than LinkedList. If you're not sure — just start with ArrayList.
TLDR, in ArrayList accessing an element takes constant time [O(1)] and adding an element takes O(n) time [worst case]. In LinkedList inserting an element takes O(n) time and accessing also takes O(n) time but LinkedList uses more memory than ArrayList.
LinkedList and ArrayList are two different implementations of the List interface. LinkedList implements it with a doubly-linked list. ArrayList implements it with a dynamically re-sizing array.
As with standard linked list and array operations, the various methods will have different algorithmic runtimes.
For LinkedList<E>
get(int index) is O(n) (with n/4 steps on average), but O(1) when index = 0 or index = list.size() - 1 (in this case, you can also use getFirst() and getLast()). One of the main benefits of LinkedList<E>
add(int index, E element) is O(n) (with n/4 steps on average), but O(1) when index = 0 or index = list.size() - 1 (in this case, you can also use addFirst() and addLast()/add()). One of the main benefits of LinkedList<E>
remove(int index) is O(n) (with n/4 steps on average), but O(1) when index = 0 or index = list.size() - 1 (in this case, you can also use removeFirst() and removeLast()). One of the main benefits of LinkedList<E>
Iterator.remove() is O(1). One of the main benefits of LinkedList<E>
ListIterator.add(E element) is O(1). One of the main benefits of LinkedList<E>
Note: Many of the operations need n/4 steps on average, constant number of steps in the best case (e.g. index = 0), and n/2 steps in worst case (middle of list)
For ArrayList<E>
get(int index) is O(1). Main benefit of ArrayList<E>
add(E element) is O(1) amortized, but O(n) worst-case since the array must be resized and copied
add(int index, E element) is O(n) (with n/2 steps on average)
remove(int index) is O(n) (with n/2 steps on average)
Iterator.remove() is O(n) (with n/2 steps on average)
ListIterator.add(E element) is O(n) (with n/2 steps on average)
Note: Many of the operations need n/2 steps on average, constant number of steps in the best case (end of list), n steps in the worst case (start of list)
LinkedList<E> allows for constant-time insertions or removals using iterators, but only sequential access of elements. In other words, you can walk the list forwards or backwards, but finding a position in the list takes time proportional to the size of the list. Javadoc says "operations that index into the list will traverse the list from the beginning or the end, whichever is closer", so those methods are O(n) (n/4 steps) on average, though O(1) for index = 0.
ArrayList<E>, on the other hand, allow fast random read access, so you can grab any element in constant time. But adding or removing from anywhere but the end requires shifting all the latter elements over, either to make an opening or fill the gap. Also, if you add more elements than the capacity of the underlying array, a new array (1.5 times the size) is allocated, and the old array is copied to the new one, so adding to an ArrayList is O(n) in the worst case but constant on average.
So depending on the operations you intend to do, you should choose the implementations accordingly. Iterating over either kind of List is practically equally cheap. (Iterating over an ArrayList is technically faster, but unless you're doing something really performance-sensitive, you shouldn't worry about this -- they're both constants.)
The main benefits of using a LinkedList arise when you re-use existing iterators to insert and remove elements. These operations can then be done in O(1) by changing the list locally only. In an array list, the remainder of the array needs to be moved (i.e. copied). On the other side, seeking in a LinkedList means following the links in O(n) (n/2 steps) for worst case, whereas in an ArrayList the desired position can be computed mathematically and accessed in O(1).
Another benefit of using a LinkedList arises when you add or remove from the head of the list, since those operations are O(1), while they are O(n) for ArrayList. Note that ArrayDeque may be a good alternative to LinkedList for adding and removing from the head, but it is not a List.
Also, if you have large lists, keep in mind that memory usage is also different. Each element of a LinkedList has more overhead since pointers to the next and previous elements are also stored. ArrayLists don't have this overhead. However, ArrayLists take up as much memory as is allocated for the capacity, regardless of whether elements have actually been added.
The default initial capacity of an ArrayList is pretty small (10 from Java 1.4 - 1.8). But since the underlying implementation is an array, the array must be resized if you add a lot of elements. To avoid the high cost of resizing when you know you're going to add a lot of elements, construct the ArrayList with a higher initial capacity.
If the data structures perspective is used to understand the two structures, a LinkedList is basically a sequential data structure which contains a head Node. The Node is a wrapper for two components : a value of type T [accepted through generics] and another reference to the Node linked to it. So, we can assert it is a recursive data structure (a Node contains another Node which has another Node and so on...). Addition of elements takes linear time in LinkedList as stated above.
An ArrayList is a growable array. It is just like a regular array. Under the hood, when an element is added, and the ArrayList is already full to capacity, it creates another array with a size which is greater than previous size. The elements are then copied from previous array to new one and the elements that are to be added are also placed at the specified indices.
Thus far, nobody seems to have addressed the memory footprint of each of these lists besides the general consensus that a LinkedList is "lots more" than an ArrayList so I did some number crunching to demonstrate exactly how much both lists take up for N null references.
Since references are either 32 or 64 bits (even when null) on their relative systems, I have included 4 sets of data for 32 and 64 bit LinkedLists and ArrayLists.
Note: The sizes shown for the ArrayList lines are for trimmed lists - In practice, the capacity of the backing array in an ArrayList is generally larger than its current element count.
Note 2: (thanks BeeOnRope) As CompressedOops is default now from mid JDK6 and up, the values below for 64-bit machines will basically match their 32-bit counterparts, unless of course you specifically turn it off.
The result clearly shows that LinkedList is a whole lot more than ArrayList, especially with a very high element count. If memory is a factor, steer clear of LinkedLists.
The formulas I used follow, let me know if I have done anything wrong and I will fix it up. 'b' is either 4 or 8 for 32 or 64 bit systems, and 'n' is the number of elements. Note the reason for the mods is because all objects in java will take up a multiple of 8 bytes space regardless of whether it is all used or not.
ArrayList:
ArrayList object header + size integer + modCount integer + array reference + (array oject header + b * n) + MOD(array oject, 8) + MOD(ArrayList object, 8) == 8 + 4 + 4 + b + (12 + b * n) + MOD(12 + b * n, 8) + MOD(8 + 4 + 4 + b + (12 + b * n) + MOD(12 + b * n, 8), 8)
LinkedList:
LinkedList object header + size integer + modCount integer + reference to header + reference to footer + (node object overhead + reference to previous element + reference to next element + reference to element) * n) + MOD(node object, 8) * n + MOD(LinkedList object, 8) == 8 + 4 + 4 + 2 * b + (8 + 3 * b) * n + MOD(8 + 3 * b, 8) * n + MOD(8 + 4 + 4 + 2 * b + (8 + 3 * b) * n + MOD(8 + 3 * b, 8) * n, 8)
ArrayList is what you want. LinkedList is almost always a (performance) bug.
Why LinkedList sucks:
It uses lots of small memory objects, and therefore impacts performance across the process.
Lots of small objects are bad for cache-locality.
Any indexed operation requires a traversal, i.e. has O(n) performance. This is not obvious in the source code, leading to algorithms O(n) slower than if ArrayList was used.
Getting good performance is tricky.
Even when big-O performance is the same as ArrayList, it is probably going to be significantly slower anyway.
It's jarring to see LinkedList in source because it is probably the wrong choice.
Algorithm ArrayList LinkedList
seek front O(1) O(1)
seek back O(1) O(1)
seek to index O(1) O(N)
insert at front O(N) O(1)
insert at back O(1) O(1)
insert after an item O(N) O(1)
Algorithms: Big-Oh Notation (archived)
ArrayLists are good for write-once-read-many or appenders, but bad at add/remove from the front or middle.
See 2021 update from author below the original answer.
Original answer (2011)
As someone who has been doing operational performance engineering on very large scale SOA web services for about a decade, I would prefer the behavior of LinkedList over ArrayList. While the steady-state throughput of LinkedList is worse and therefore might lead to buying more hardware -- the behavior of ArrayList under pressure could lead to apps in a cluster expanding their arrays in near synchronicity and for large array sizes could lead to lack of responsiveness in the app and an outage, while under pressure, which is catastrophic behavior.
Similarly, you can get better throughput in an app from the default throughput tenured garbage collector, but once you get java apps with 10GB heaps you can wind up locking up the app for 25 seconds during a Full GCs which causes timeouts and failures in SOA apps and blows your SLAs if it occurs too often. Even though the CMS collector takes more resources and does not achieve the same raw throughput, it is a much better choice because it has more predictable and smaller latency.
ArrayList is only a better choice for performance if all you mean by performance is throughput and you can ignore latency. In my experience at my job I cannot ignore worst-case latency.
Update (Aug 27, 2021 -- 10 years later)
This answer (my most historically upvoted answer on SO as well) is very likely wrong (for reasons outlined in the comments below). I'd like to add that ArrayList will optimize for sequential reading of memory and minimize cache-line and TLB misses, etc. The copying overhead when the array grows past the bounds is likely inconsequential by comparison (and can be done by efficient CPU operations). This answer is also probably getting worse over time given hardware trends. The only situations where a LinkedList might make sense would be something highly contrived where you had thousands of Lists any one of which might grow to be GB-sized, but where no good guess could be made at allocation-time of the List and setting them all to GB-sized would blow up the heap. And if you found some problem like that, then it really does call for reengineering whatever your solution is (and I don't like to lightly suggest reengineering old code because I myself maintain piles and piles of old code, but that'd be a very good case of where the original design has simply run out of runway and does need to be chucked). I'll still leave my decades-old poor opinion up there for you to read though. Simple, logical and pretty wrong.
Joshua Bloch, the author of LinkedList:
Does anyone actually use LinkedList? I wrote it, and I never use it.
Link: https://twitter.com/joshbloch/status/583813919019573248
I'm sorry for the answer not being as informative as the other answers, but I thought it would be the most self-explanatory if not revealing.
Yeah, I know, this is an ancient question, but I'll throw in my two cents:
LinkedList is almost always the wrong choice, performance-wise. There are some very specific algorithms where a LinkedList is called for, but those are very, very rare and the algorithm will usually specifically depend on LinkedList's ability to insert and delete elements in the middle of the list relatively quickly, once you've navigated there with a ListIterator.
There is one common use case in which LinkedList outperforms ArrayList: that of a queue. However, if your goal is performance, instead of LinkedList you should also consider using an ArrayBlockingQueue (if you can determine an upper bound on your queue size ahead of time, and can afford to allocate all the memory up front), or this CircularArrayList implementation. (Yes, it's from 2001, so you'll need to generify it, but I got comparable performance ratios to what's quoted in the article just now in a recent JVM)
It's an efficiency question. LinkedList is fast for adding and deleting elements, but slow to access a specific element. ArrayList is fast for accessing a specific element but can be slow to add to either end, and especially slow to delete in the middle.
Array vs ArrayList vs LinkedList vs Vector goes more in depth, as does
Linked List.
Correct or Incorrect: Please execute test locally and decide for yourself!
Edit/Remove is faster in LinkedList than ArrayList.
ArrayList, backed by Array, which needs to be double the size, is worse in large volume application.
Below is the unit test result for each operation.Timing is given in Nanoseconds.
Operation ArrayList LinkedList
AddAll (Insert) 101,16719 2623,29291
Add (Insert-Sequentially) 152,46840 966,62216
Add (insert-randomly) 36527 29193
remove (Delete) 20,56,9095 20,45,4904
contains (Search) 186,15,704 189,64,981
Here's the code:
import org.junit.Assert;
import org.junit.Test;
import java.util.*;
public class ArrayListVsLinkedList {
private static final int MAX = 500000;
String[] strings = maxArray();
////////////// ADD ALL ////////////////////////////////////////
#Test
public void arrayListAddAll() {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> arrayList = new ArrayList<String>(MAX);
watch.start();
arrayList.addAll(stringList);
watch.totalTime("Array List addAll() = ");//101,16719 Nanoseconds
}
#Test
public void linkedListAddAll() throws Exception {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
watch.start();
List<String> linkedList = new LinkedList<String>();
linkedList.addAll(stringList);
watch.totalTime("Linked List addAll() = "); //2623,29291 Nanoseconds
}
//Note: ArrayList is 26 time faster here than LinkedList for addAll()
///////////////// INSERT /////////////////////////////////////////////
#Test
public void arrayListAdd() {
Watch watch = new Watch();
List<String> arrayList = new ArrayList<String>(MAX);
watch.start();
for (String string : strings)
arrayList.add(string);
watch.totalTime("Array List add() = ");//152,46840 Nanoseconds
}
#Test
public void linkedListAdd() {
Watch watch = new Watch();
List<String> linkedList = new LinkedList<String>();
watch.start();
for (String string : strings)
linkedList.add(string);
watch.totalTime("Linked List add() = "); //966,62216 Nanoseconds
}
//Note: ArrayList is 9 times faster than LinkedList for add sequentially
/////////////////// INSERT IN BETWEEN ///////////////////////////////////////
#Test
public void arrayListInsertOne() {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> arrayList = new ArrayList<String>(MAX + MAX / 10);
arrayList.addAll(stringList);
String insertString0 = getString(true, MAX / 2 + 10);
String insertString1 = getString(true, MAX / 2 + 20);
String insertString2 = getString(true, MAX / 2 + 30);
String insertString3 = getString(true, MAX / 2 + 40);
watch.start();
arrayList.add(insertString0);
arrayList.add(insertString1);
arrayList.add(insertString2);
arrayList.add(insertString3);
watch.totalTime("Array List add() = ");//36527
}
#Test
public void linkedListInsertOne() {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> linkedList = new LinkedList<String>();
linkedList.addAll(stringList);
String insertString0 = getString(true, MAX / 2 + 10);
String insertString1 = getString(true, MAX / 2 + 20);
String insertString2 = getString(true, MAX / 2 + 30);
String insertString3 = getString(true, MAX / 2 + 40);
watch.start();
linkedList.add(insertString0);
linkedList.add(insertString1);
linkedList.add(insertString2);
linkedList.add(insertString3);
watch.totalTime("Linked List add = ");//29193
}
//Note: LinkedList is 3000 nanosecond faster than ArrayList for insert randomly.
////////////////// DELETE //////////////////////////////////////////////////////
#Test
public void arrayListRemove() throws Exception {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> arrayList = new ArrayList<String>(MAX);
arrayList.addAll(stringList);
String searchString0 = getString(true, MAX / 2 + 10);
String searchString1 = getString(true, MAX / 2 + 20);
watch.start();
arrayList.remove(searchString0);
arrayList.remove(searchString1);
watch.totalTime("Array List remove() = ");//20,56,9095 Nanoseconds
}
#Test
public void linkedListRemove() throws Exception {
Watch watch = new Watch();
List<String> linkedList = new LinkedList<String>();
linkedList.addAll(Arrays.asList(strings));
String searchString0 = getString(true, MAX / 2 + 10);
String searchString1 = getString(true, MAX / 2 + 20);
watch.start();
linkedList.remove(searchString0);
linkedList.remove(searchString1);
watch.totalTime("Linked List remove = ");//20,45,4904 Nanoseconds
}
//Note: LinkedList is 10 millisecond faster than ArrayList while removing item.
///////////////////// SEARCH ///////////////////////////////////////////
#Test
public void arrayListSearch() throws Exception {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> arrayList = new ArrayList<String>(MAX);
arrayList.addAll(stringList);
String searchString0 = getString(true, MAX / 2 + 10);
String searchString1 = getString(true, MAX / 2 + 20);
watch.start();
arrayList.contains(searchString0);
arrayList.contains(searchString1);
watch.totalTime("Array List addAll() time = ");//186,15,704
}
#Test
public void linkedListSearch() throws Exception {
Watch watch = new Watch();
List<String> linkedList = new LinkedList<String>();
linkedList.addAll(Arrays.asList(strings));
String searchString0 = getString(true, MAX / 2 + 10);
String searchString1 = getString(true, MAX / 2 + 20);
watch.start();
linkedList.contains(searchString0);
linkedList.contains(searchString1);
watch.totalTime("Linked List addAll() time = ");//189,64,981
}
//Note: Linked List is 500 Milliseconds faster than ArrayList
class Watch {
private long startTime;
private long endTime;
public void start() {
startTime = System.nanoTime();
}
private void stop() {
endTime = System.nanoTime();
}
public void totalTime(String s) {
stop();
System.out.println(s + (endTime - startTime));
}
}
private String[] maxArray() {
String[] strings = new String[MAX];
Boolean result = Boolean.TRUE;
for (int i = 0; i < MAX; i++) {
strings[i] = getString(result, i);
result = !result;
}
return strings;
}
private String getString(Boolean result, int i) {
return String.valueOf(result) + i + String.valueOf(!result);
}
}
ArrayList is essentially an array. LinkedList is implemented as a double linked list.
The get is pretty clear. O(1) for ArrayList, because ArrayList allow random access by using index. O(n) for LinkedList, because it needs to find the index first. Note: there are different versions of add and remove.
LinkedList is faster in add and remove, but slower in get. In brief, LinkedList should be preferred if:
there are no large number of random access of element
there are a large number of add/remove operations
=== ArrayList ===
add(E e)
add at the end of ArrayList
require memory resizing cost.
O(n) worst, O(1) amortized
add(int index, E element)
add to a specific index position
require shifting & possible memory resizing cost
O(n)
remove(int index)
remove a specified element
require shifting & possible memory resizing cost
O(n)
remove(Object o)
remove the first occurrence of the specified element from this list
need to search the element first, and then shifting & possible memory resizing cost
O(n)
=== LinkedList ===
add(E e)
add to the end of the list
O(1)
add(int index, E element)
insert at specified position
need to find the position first
O(n)
remove()
remove first element of the list
O(1)
remove(int index)
remove element with specified index
need to find the element first
O(n)
remove(Object o)
remove the first occurrence of the specified element
need to find the element first
O(n)
Here is a figure from programcreek.com (add and remove are the first type, i.e., add an element at the end of the list and remove the element at the specified position in the list.):
TL;DR due to modern computer architecture, ArrayList will be significantly more efficient for nearly any possible use-case - and therefore LinkedList should be avoided except some very unique and extreme cases.
In theory, LinkedList has an O(1) for the add(E element)
Also adding an element in the mid of a list should be very efficient.
Practice is very different, as LinkedList is a Cache Hostile Data structure. From performance POV - there are very little cases where LinkedList could be better performing than the Cache-friendly ArrayList.
Here are results of a benchmark testing inserting elements in random locations. As you can see - the array list if much more efficient, although in theory each insert in the middle of the list will require "move" the n later elements of the array (lower values are better):
Working on a later generation hardware (bigger, more efficient caches) - the results are even more conclusive:
LinkedList takes much more time to accomplish the same job. source Source Code
There are two main reasons for this:
Mainly - that the nodes of the LinkedList are scattered randomly across the memory. RAM ("Random Access Memory") isn't really random and blocks of memory need to be fetched to cache. This operation takes time, and when such fetches happen frequently - the memory pages in the cache need to be replaced all the time -> Cache misses -> Cache is not efficient.
ArrayList elements are stored on continuous memory - which is exactly what the modern CPU architecture is optimizing for.
Secondary LinkedList required to hold back/forward pointers, which means 3 times the memory consumption per value stored compared to ArrayList.
DynamicIntArray, btw, is a custom ArrayList implementation holding Int (primitive type) and not Objects - hence all data is really stored adjacently - hence even more efficient.
A key elements to remember is that the cost of fetching memory block, is more significant than the cost accessing a single memory cell. That's why reader 1MB of sequential memory is up to x400 times faster than reading this amount of data from different blocks of memory:
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Source: Latency Numbers Every Programmer Should Know
Just to make the point even clearer, please check the benchmark of adding elements to the beginning of the list. This is a use-case where, in-theory, the LinkedList should really shine, and ArrayList should present poor or even worse-case results:
Note: this is a benchmark of the C++ Std lib, but my previous experience shown the C++ and Java results are very similar. Source Code
Copying a sequential bulk of memory is an operation optimized by the modern CPUs - changing theory and actually making, again, ArrayList/Vector much more efficient
Credits: All benchmarks posted here are created by Kjell Hedström. Even more data can be found on his blog
ArrayList is randomly accessible, while LinkedList is really cheap to expand and remove elements from. For most cases, ArrayList is fine.
Unless you've created large lists and measured a bottleneck, you'll probably never need to worry about the difference.
You can use one over the other based on the time complexities of the operations that you'd perform on that particular List.
|---------------------|---------------------|--------------------|------------|
| Operation | ArrayList | LinkedList | Winner |
|---------------------|---------------------|--------------------|------------|
| get(index) | O(1) | O(n) | ArrayList |
| | | n/4 steps in avg | |
|---------------------|---------------------|--------------------|------------|
| add(E) | O(1) | O(1) | LinkedList |
| |---------------------|--------------------| |
| | O(n) in worst case | | |
|---------------------|---------------------|--------------------|------------|
| add(index, E) | O(n) | O(n) | LinkedList |
| | n/2 steps | n/4 steps | |
| |---------------------|--------------------| |
| | | O(1) if index = 0 | |
|---------------------|---------------------|--------------------|------------|
| remove(index, E) | O(n) | O(n) | LinkedList |
| |---------------------|--------------------| |
| | n/2 steps | n/4 steps | |
|---------------------|---------------------|--------------------|------------|
| Iterator.remove() | O(n) | O(1) | LinkedList |
| ListIterator.add() | | | |
|---------------------|---------------------|--------------------|------------|
|--------------------------------------|-----------------------------------|
| ArrayList | LinkedList |
|--------------------------------------|-----------------------------------|
| Allows fast read access | Retrieving element takes O(n) |
|--------------------------------------|-----------------------------------|
| Adding an element require shifting | o(1) [but traversing takes time] |
| all the later elements | |
|--------------------------------------|-----------------------------------|
| To add more elements than capacity |
| new array need to be allocated |
|--------------------------------------|
If your code has add(0) and remove(0), use a LinkedList and it's prettier addFirst() and removeFirst() methods. Otherwise, use ArrayList.
And of course, Guava's ImmutableList is your best friend.
Let's compare LinkedList and ArrayList w.r.t. below parameters:
1. Implementation
ArrayList is the resizable array implementation of list interface , while
LinkedList is the Doubly-linked list implementation of the list interface.
2. Performance
get(int index) or search operation
ArrayList get(int index) operation runs in constant time i.e O(1) while
LinkedList get(int index) operation run time is O(n) .
The reason behind ArrayList being faster than LinkedList is that ArrayList uses an index based system for its elements as it internally uses an array data structure, on the other hand,
LinkedList does not provide index-based access for its elements as it iterates either from the beginning or end (whichever is closer) to retrieve the node at the specified element index.
insert() or add(Object) operation
Insertions in LinkedList are generally fast as compare to ArrayList. In LinkedList adding or insertion is O(1) operation .
While in ArrayList, if the array is the full i.e worst case, there is an extra cost of resizing array and copying elements to the new array, which makes runtime of add operation in ArrayList O(n), otherwise it is O(1).
remove(int) operation
Remove operation in LinkedList is generally the same as ArrayList i.e. O(n).
In LinkedList, there are two overloaded remove methods. one is remove() without any parameter which removes the head of the list and runs in constant time O(1). The other overloaded remove method in LinkedList is remove(int) or remove(Object) which removes the Object or int passed as a parameter. This method traverses the LinkedList until it found the Object and unlink it from the original list. Hence this method runtime is O(n).
While in ArrayList remove(int) method involves copying elements from the old array to new updated array, hence its runtime is O(n).
3. Reverse Iterator
LinkedList can be iterated in reverse direction using descendingIterator() while
there is no descendingIterator() in ArrayList , so we need to write our own code to iterate over the ArrayList in reverse direction.
4. Initial Capacity
If the constructor is not overloaded, then ArrayList creates an empty list of initial capacity 10, while
LinkedList only constructs the empty list without any initial capacity.
5. Memory Overhead
Memory overhead in LinkedList is more as compared to ArrayList as a node in LinkedList needs to maintain the addresses of the next and previous node. While
In ArrayList each index only holds the actual object(data).
Source
I know this is an old post, but I honestly can't believe nobody mentioned that LinkedList implements Deque. Just look at the methods in Deque (and Queue); if you want a fair comparison, try running LinkedList against ArrayDeque and do a feature-for-feature comparison.
Here is the Big-O notation in both ArrayList and LinkedList and also CopyOnWrite-ArrayList:
ArrayList
get O(1)
add O(1)
contains O(n)
next O(1)
remove O(n)
iterator.remove O(n)
LinkedList
get O(n)
add O(1)
contains O(n)
next O(1)
remove O(1)
iterator.remove O(1)
CopyOnWrite-ArrayList
get O(1)
add O(n)
contains O(n)
next O(1)
remove O(n)
iterator.remove O(n)
Based on these you have to decide what to choose. :)
In addition to the other good arguments above, you should notice ArrayList implements RandomAccess interface, while LinkedList implements Queue.
So, somehow they address slightly different problems, with difference of efficiency and behavior (see their list of methods).
It depends upon what operations you will be doing more on the List.
ArrayList is faster to access an indexed value. It is much worse when inserting or deleting objects.
To find out more, read any article that talks about the difference between arrays and linked lists.
See the Java Tutorials - List Implementations.
An array list is essentially an array with methods to add items etc. (and you should use a generic list instead). It is a collection of items which can be accessed through an indexer (for example [0]). It implies a progression from one item to the next.
A linked list specifies a progression from one item to the next (Item a -> item b). You can get the same effect with an array list, but a linked list absolutely says what item is supposed to follow the previous one.
An important feature of a linked list (which I didn't read in another answer) is the concatenation of two lists. With an array this is O(n) (+ overhead of some reallocations) with a linked list this is only O(1) or O(2) ;-)
Important: For Java its LinkedList this is not true! See Is there a fast concat method for linked list in Java?
ArrayList and LinkedList have their own pros and cons.
ArrayList uses contiguous memory address compared to LinkedList which uses pointers toward the next node. So when you want to look up an element in an ArrayList is faster than doing n iterations with LinkedList.
On the other hand, insertion and deletion in a LinkedList are much easier because you just have to change the pointers whereas an ArrayList implies the use of shift operation for any insertion or deletion.
If you have frequent retrieval operations in your app use an ArrayList. If you have frequent insertion and deletion use a LinkedList.
1) Underlying Data Structure
The first difference between ArrayList and LinkedList comes with the fact that ArrayList is backed by Array while LinkedList is backed by LinkedList. This will lead to further differences in performance.
2) LinkedList implements Deque
Another difference between ArrayList and LinkedList is that apart from the List interface, LinkedList also implements Deque interface, which provides first in first out operations for add() and poll() and several other Deque functions. 3) Adding elements in ArrayList Adding element in ArrayList is O(1) operation if it doesn't trigger re-size of Array, in which case it becomes O(log(n)), On the other hand, appending an element in LinkedList is O(1) operation, as it doesn't require any navigation.
4) Removing an element from a position
In order to remove an element from a particular index e.g. by calling remove(index), ArrayList performs a copy operation which makes it close to O(n) while LinkedList needs to traverse to that point which also makes it O(n/2), as it can traverse from either direction based upon proximity.
5) Iterating over ArrayList or LinkedList
Iteration is the O(n) operation for both LinkedList and ArrayList where n is a number of an element.
6) Retrieving element from a position
The get(index) operation is O(1) in ArrayList while its O(n/2) in LinkedList, as it needs to traverse till that entry. Though, in Big O notation O(n/2) is just O(n) because we ignore constants there.
7) Memory
LinkedList uses a wrapper object, Entry, which is a static nested class for storing data and two nodes next and previous while ArrayList just stores data in Array.
So memory requirement seems less in the case of ArrayList than LinkedList except for the case where Array performs the re-size operation when it copies content from one Array to another.
If Array is large enough it may take a lot of memory at that point and trigger Garbage collection, which can slow response time.
From all the above differences between ArrayList vs LinkedList, It looks ArrayList is the better choice than LinkedList in almost all cases, except when you do a frequent add() operation than remove(), or get().
It's easier to modify a linked list than ArrayList, especially if you are adding or removing elements from start or end because linked list internally keeps references of those positions and they are accessible in O(1) time.
In other words, you don't need to traverse through the linked list to reach the position where you want to add elements, in that case, addition becomes O(n) operation. For example, inserting or deleting an element in the middle of a linked list.
In my opinion, use ArrayList over LinkedList for most of the practical purpose in Java.
I have read the responses, but there is one scenario where I always use a LinkedList over an ArrayList that I want to share to hear opinions:
Every time I had a method that returns a list of data obtained from a DB I always use a LinkedList.
My rationale was that because it is impossible to know exactly how many results am I getting, there will be not memory wasted (as in ArrayList with the difference between the capacity and actual number of elements), and there would be no time wasted trying to duplicate the capacity.
As far a ArrayList, I agree that at least you should always use the constructor with the initial capacity, to minimize the duplication of the arrays as much as possible.
ArrayList and LinkedList both implements List interface and their methods and results are almost identical. However there are few differences between them which make one better over another depending on the requirement.
ArrayList Vs LinkedList
1) Search: ArrayList search operation is pretty fast compared to the LinkedList search operation. get(int index) in ArrayList gives the performance of O(1) while LinkedList performance is O(n).
Reason: ArrayList maintains index based system for its elements as it uses array data structure implicitly which makes it faster for searching an element in the list. On the other side LinkedList implements doubly linked list which requires the traversal through all the elements for searching an element.
2) Deletion: LinkedList remove operation gives O(1) performance while ArrayList gives variable performance: O(n) in worst case (while removing first element) and O(1) in best case (While removing last element).
Conclusion: LinkedList element deletion is faster compared to
ArrayList.
Reason: LinkedList’s each element maintains two pointers (addresses) which points to the both neighbor elements in the list. Hence removal only requires change in the pointer location in the two neighbor nodes (elements) of the node which is going to be removed. While In ArrayList all the elements need to be shifted to fill out the space created by removed element.
3) Inserts Performance: LinkedList add method gives O(1) performance while ArrayList gives O(n) in worst case. Reason is same as explained for remove.
4) Memory Overhead: ArrayList maintains indexes and element data while LinkedList maintains element data and two pointers for neighbor nodes
hence the memory consumption is high in LinkedList comparatively.
There are few similarities between these classes which are as follows:
Both ArrayList and LinkedList are implementation of List interface.
They both maintain the elements insertion order which means while displaying ArrayList and LinkedList elements the result set would be having the same order in which the elements got inserted into the List.
Both these classes are non-synchronized and can be made synchronized explicitly by using Collections.synchronizedList method.
The iterator and listIterator returned by these classes are fail-fast (if list is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove or add methods, the iterator will throw a ConcurrentModificationException).
When to use LinkedList and when to use ArrayList?
As explained above the insert and remove operations give good performance (O(1)) in LinkedList compared to ArrayList(O(n)).
Hence if there is a requirement of frequent addition and deletion in application then LinkedList is a best choice.
Search (get method) operations are fast in Arraylist (O(1)) but not in LinkedList (O(n))
so If there are less add and remove operations and more search operations requirement, ArrayList would be your best bet.
Operation get(i) in ArrayList is faster than LinkedList, because:
ArrayList: Resizable-array implementation of the List interface
LinkedList: Doubly-linked list implementation of the List and Deque interfaces
Operations that index into the list will traverse the list from the beginning or the end, whichever is closer to the specified index.
Both remove() and insert() have a runtime efficiency of O(n) for both ArrayLists and LinkedLists. However, the reason behind the linear processing time comes from two very different reasons:
In an ArrayList, you get to the element in O(1), but actually removing or inserting something makes it O(n) because all the following elements need to be changed.
In a LinkedList, it takes O(n) to actually get to the desired element, because we have to start at the very beginning until we reach the desired index. Actually removing or inserting is constant, because we only have to change 1 reference for remove() and 2 references for insert().
Which of the two is faster for inserting and removing depends on where it happens. If we are closer to the beginning the LinkedList will be faster, because we have to go through relatively few elements. If we are closer to the end an ArrayList will be faster, because we get there in constant time and only have to change the few remaining elements that follow it. When done precisely in the middle the LinkedList will be faster because going through n elements is quicker than moving n values.
Bonus: While there is no way of making these two methods O(1) for an ArrayList, there actually is a way to do this in LinkedLists. Let's say we want to go through the entire List removing and inserting elements on our way. Usually, you would start from the very beginning for each element using the LinkedList, we could also "save" the current element we're working on with an Iterator. With the help of the Iterator, we get an O(1) efficiency for remove() and insert() when working in a LinkedList. Making it the only performance benefit I'm aware of where a LinkedList is always better than an ArrayList.
One of the tests I saw on here only conducts the test once. But what I have noticed is that you need to run these tests many times and eventually their times will converge. Basically the JVM needs to warm up. For my particular use case I needed to add/remove items to a list that grows to about 500 items. In my tests LinkedList came out faster, with LinkedList coming in around 50,000 NS and ArrayList coming in at around 90,000 NS... give or take. See the code below.
public static void main(String[] args) {
List<Long> times = new ArrayList<>();
for (int i = 0; i < 100; i++) {
times.add(doIt());
}
System.out.println("avg = " + (times.stream().mapToLong(x -> x).average()));
}
static long doIt() {
long start = System.nanoTime();
List<Object> list = new LinkedList<>();
//uncomment line below to test with ArrayList
//list = new ArrayList<>();
for (int i = 0; i < 500; i++) {
list.add(i);
}
Iterator it = list.iterator();
while (it.hasNext()) {
it.next();
it.remove();
}
long end = System.nanoTime();
long diff = end - start;
//uncomment to see the JVM warmup and get faster for the first few iterations
//System.out.println(diff)
return diff;
}
ArrayList extends AbstractList and implements the List Interface. ArrayList is dynamic array.It can be said that it was basically created to overcome the drawbacks of arrays
The LinkedList class extends AbstractSequentialList and implements List,Deque, and Queue interface.
Performance
arraylist.get() is O(1) whereas linkedlist.get() is O(n)
arraylist.add() is O(1) and linkedlist.add() is 0(1)
arraylist.contains() is O(n) andlinkedlist.contains() is O(n)
arraylist.next() is O(1) and linkedlist.next() is O(1)
arraylist.remove() is O(n) whereas linkedlist.remove() is O(1)
In arraylistiterator.remove() is O(n) whereas In linkedlist iterator.remove()is O(1)

How does an ArrayList retrieve data in constant time? [duplicate]

This question already has answers here:
Why is accessing any single element in an array done in constant time ( O(1) )?
(5 answers)
Closed 4 years ago.
One interview question which I couldn't answer and couldn't find any relevant answers online.
Suppose in an arraylist, there are 10000 data, and I want to find the number which is currently on 5000th index, how does the arraylist know the indexes and give result in constant time?
Because if we are traversing through the arraylist to find the data, it would take linear time and not constant time.
Thanks in advance.
The storage backing an ArrayList is an array. Whether primitive values or object references are stored, all objects in the array are in consecutive order in memory.
For array access, all the compiler has to do is have instructions that calculate the correct memory address based on the initial address and which index is desired, which is O(1). Then it can go directly to that calculated address. There is no traversing, so it is not O(n).
ArrayLists can be thought of as an array of Objects (Which happens to be exactly how they are implemented). You can index into it as any other array at O(1). The advantage over a true "Array" is that it tracks a "Length" independent of the array's length and automatically extends the array when it "overflows"--plus a few extra operations.
LinkedLists (probably the structure you are thinking of) require you to walk from one item to the next, so the implementation is O(n) to find an item at an index.
ArrayList
ArrayList uses an array under the hood, thus the name. Arrays are data-structures with a direct, fast, index-based access.
So if you ask for the element at index 5 000, it just asks its internal array:
// More or less
return array[5000];
Here's the full method from OpenJDK 8:
/**
* Returns the element at the specified position in this list.
*
* #param index index of the element to return
* #return the element at the specified position in this list
* #throws IndexOutOfBoundsException {#inheritDoc}
*/
public E get(int index) {
rangeCheck(index);
return elementData(index);
}
In particular, it does not traverse all elements up to that point. That's what other data-structures, without index-based access, need to do. Such as LinkedList. Note that there is an indicator interface, called RandomAccess (documentation). Classes implementing that interface have a direct index-based access. Current implementations are:
ArrayList, AttributeList, CopyOnWriteArrayList,
RoleList, RoleUnresolvedList, Stack, Vector
How arrays work
So, how does an array have direct access to that element? Well, arrays are of fixed size. When you create it, you need to tell it the size. For example 10 000:
Foo[] array = new Foo[10000];
Your computer will then allocate contiguous memory for 10 000 objects of Foo. The key is that the memory area is contiguous, not scattered around. So the third element comes directly after the second and directly before the fourth element in your memory.
When you now want to retrieve the element at position 5 000, your computer retrieves the Foo object at the following memory position:
startAddressOfArray + 5000 * sizeOfFoo
Everything is known since declaration of the array and the computation is fast, obviously in constant time O(1). Thus, arrays have direct index-based access to their elements. Because the stuff is stored together, contiguously in memory.
You may read more about arrays on Wikipedia.
Here is an image from techcrashcourse.com showing an array with the addresses of each element:
The array is of size 7 and stores integers that use 2 bytes (16 bits). Usually called short, so a new short[7] array. You can see that each element is offset by 2 bytes (the size of a short) to its previous element. Which makes it possible to access an element at a given position directly with a simple computation, as shown.
As its name suggests, ArrayList stores elements in an array. Here is the relevant piece of code in oracle JDK :
/**
* The array buffer into which the elements of the ArrayList are stored.
* The capacity of the ArrayList is the length of this array buffer. Any
* empty ArrayList with elementData == DEFAULTCAPACITY_EMPTY_ELEMENTDATA
* will be expanded to DEFAULT_CAPACITY when the first element is added.
*/
transient Object[] elementData; // non-private to simplify nested class access
Thus, without surprise, list.get(index) only gets the nth element in the internal array :
public E get(int index) {
Objects.checkIndex(index, size);
return elementData(index);
}
E elementData(int index) {
return (E) elementData[index];
}

Distinction between the capacity of an array list and the size of an array

I read the below snippet in Core Java I book.
Allocating an array list as
new ArrayList <'Employee>(100) // capacity is 100
is not the same as allocating a new array as
new Employee[100] // size is 100
There is an important distinction between the capacity of an array list and the size of an
array. If you allocate an array with 100 entries, then the array has 100 slots, ready for use.
An array list with a capacity of 100 elements has the potential of holding 100 elements (and,
in fact, more than 100, at the cost of additional reallocations); but at the beginning, even
after its initial construction, an array list holds no elements at all.
When I saw the source code array list, the constructor creates of an Object array of given capacity which is ready to hold elements of given capacity (below is the code snippet).
public ArrayList(int initialCapacity) {
super();
if (initialCapacity < 0)
throw new IllegalArgumentException("Illegal Capacity: "+
initialCapacity);
this.elementData = new Object[initialCapacity];
}
I am not able to figure out the actual difference what the author has mentioned in above text.
If you allocate a new array with arr = new Employee[100], the size of that array (arr.length) is going to be 100. It has 100 elements. All the elements are initially null (as this is an array of object references), but still, there are 100 elements.
If you do something like list = new ArrayList <Employee>(100), and try to check list.size(), you'll get 0. There are no elements in the list.
Internally, it's true that the ArrayList allocates enough place to put 100 items before it needs to extend its capacity, but that's an internal implementation detail, and the list presents its content to you as "no items stored". Only if you actually do list.add(something), you'll have items in the list.
So although the list allocates storage in advance, the API with which it communicates with the program tells you there are no items in it. The null items in its internal array are not available to you - you cannot retrieve them or change them.
An ArrayList is just one way to represent an abstract list, and the capacity of an ArrayList is an implementation detail of how the system implements the logical list.
An ArrayList stores the elements of a list by using an actual array "under the covers." The actual realization of the array in computer memory has a certain size when it is allocated; this size is the ArrayList's capacity. The ArrayList emulates a variable-sized list by storing the logical length of the list in addition to the fixed-length array. Thus if you have an ArrayList with a capacity 10 which contains 4 logical elements, the ArrayList can be represented as a length and an array
(4)
| e1 | e2 | e3 | e4 | __ | __ | __| __ | __ | __ |
where the (4) is the logical length of the list and '__' represent data that is ignored because it is not part of the logical list. If you attempt to access the 5th element of this ArrayList, it will throw an exception because it knows that the fifth element has not been initialized. If we then append an extra element e5 to the list, the ArrayList becomes
(5)
| e1 | e2 | e3 | e4 | e5 | __ | __ | __ | __ | __ |
Note that the capacity has not changed, while the logical length has, because the underlying array is still able to handle all the data in the logical list.
If you manage to add more than ten elements to this list, the ArrayList will not break. The ArrayList is an abstraction meant to be compatible with all array operations. Rather, the ArrayList changes its capacity when its logical length exceeds its original capacity. If we were to add the elements (a1, a2, ..., a7) to the above list, the resulting ArrayList might look like
(12)
| e1 | e2 | e3 | e4 | e5 | a1 | a2 | a3 | a4 | a5 | a6 | a7 | __ | __ | __ | __ | __ | __ | __ | __ |
with a capacity of 20.
Once you have created an ArrayList, you can ignore the capacity in all programming that follows; the logic is unaffected. However, the performance of the system under certain kinds of operations can be affected. Increasing the capacity, for instance, might well involved allocating a larger array, copying the first array into the second and then performing the operations. This can be quite slow compared to, e.g. the same operation on a linked list. Thus it is sensible to choose the capacity of an ArrayList to be bigger than, or at least comparable to, the actual number of elements expected in the real runtime environment.
Arrays have a length, which is specified at creation and cannot be altered.
If you create a new array myArray = new Object[100] then you can read and write from myArray[0] to myArray[99] (and you'll find it full of null).
Lists, on the other hand, have a size() that starts at zero and grows when you add items. The size() of a list tracks how many things you have actually put in it, rather than how much space it has.
If you create a list using myList = new ArrayList(100) and then you try and get or set any elements, you will get an IndexOutOfBoundsException, because the the list is empty until you add something to it.
In summary, the array of size 100 will initially hold 100 nulls, but the list will be empty.
This just seems poorly worded and potentially incorrect if I'm not understanding it correctly.
I believe what it is trying to say is that there is a difference between the initial capacity of the ArrayList and the initial size of the ArrayList.
List<Employee> employees = new ArrayList<>(100);
int size = employes.size();
size will be 0 while the initial capacity is 100.
You are correct with how you are reading the source code.
The difference is between a fixed size container (data structure) and a variable size container.
An array is a fixed size container, the number of elements it holds is established when the array is created and never changes. (When the array is created all of those elements will have some default value, e.g., null for reference types or 0 for ints, but they'll all be there in the array: you can index each and every one.)
A list is a variable size container, the number of elements in it can change, ranging from 0 to as many as you want (subject to implementation limits). After creation the number of elements can either grow or shrink. At all times you can retrieve any element by its index.
But the Java concept List is actually an interface and it can be implemented in many different ways. Thus, ArrayList, LinkedList, etc. There is a data structure "behind" the list to actually hold the elements. And that data structure itself might be fixed size or variable size, and at any given time might have the exact size of the number of elements in the list, or it might have some extra "buffer" space.
The LinkedList, for example, always has in its underlying data structure exactly the same number of "places for elements" as are in the list it is representing. But the ArrayList uses a fixed length array as its backing store.
For the ArrayList, at any given time the number of elements in the list might be different than the number of elements the array behind it can hold. Those "extra" places for elements just contain nulls or 0s or whatever, but the ArrayList never gives you access to those places. As you add elements to the ArrayList they take up more places in the underlying array, until finally the underlying array is full. The next element you add to the ArrayList causes an entirely new fixed size array - somewhat bigger than the "current" array - to be allocated, and all the list elements copied to it (the original array is discarded). To prevent this expensive operation (allocation and copy) from happening too often the new array is larger than the current array (by some factor) and thus has elements which will not at that time hold elements of the list - they're empty (null or 0).
So, because there is (potentially) a difference between the number of elements in the list being represented, and the number of elements the implementing data structure can hold there are two concepts in force.
The size of the list is the number of elements in it. The capacity of the list is the number of elements the backing data structure can hold at this time. The size will change as elements are added to or removed from the list. The capacity will change when the implementation of the list you're using needs it to. (The size, of course, will never be bigger than the capacity.)
(BTW, for fixed size containers the size is frequently called length, thus arrays have a property length and strings have a method length(). Different languages - sometimes even the same language - use "size" and "length" inconsistently, but they always mean size, and the term "capacity" is always used for the size/length of the underlying data structure.)
Let's use a real life example.
Consider an eighteen-seater bus, the capacity is eighteen passengers. The size of the passengers at any given time can be less than eighteen but not more than. When the number of passengers is eighteen, another passenger can't be accommodated.
In an ArrayList, the capacity has something in common with that of our bus in that it defines the number of elements that can fit in. Unlike our bus however, the capacity expands to accommodate the number of elements until Integer.MAX_VALUE.
The same goes for the size, just like our bus, the size of the elements in the list cannot exceed the capacity. Just imagine when 50 passengers are riding an eighteen-seater bus! You sure don't want to be in that bus.
ArrayList is a dynamic array implementation of the List interface.
We do not have to worry about the size of the ArrayList when we add
elements to it.
It grows automatically as we add the elements to it and resizes the underlying array
accordingly.
The size of this internal array is the capacity of the ArrayList.
When the internal array is full and we try to add an element to the
ArrayList, a new array is created with more capacity and all existing array items are
copied to it.
Example:
ArrayList aListNumbers = new ArrayList(20);
Will create an ArrayList object with an initial capacity of 20.
That means the ArrayList will be able to hold 20 elements before it needs to resize the internal array.

picking without replacement in java

I often* find myself in need of a data structure which has the following properties:
can be initialized with an array of n objects in O(n).
one can obtain a random element in O(1), after this operation the picked
element is removed from the structure.
(without replacement)
one can undo p 'picking without replacement' operations in O(p)
one can remove a specific object (eg by id) from the structure in O(log(n))
one can obtain an array of the objects currently in the structure in
O(n).
the complexity (or even possibility) of other actions (eg insert) does not matter. Besides the complexity it should also be efficient for small numbers of n.
Can anyone give me guidelines on implementing such a structure? I currently implemented a structure having all above properties, except the picking of the element takes O(d) with d the number of past picks (since I explicitly check whether it is 'not yet picked'). I can figure out structures allowing picking in O(1), but these have higher complexities on at least one of the other operations.
BTW:
note that O(1) above implies that the complexity is independent from #earlier picked elements and independent from total #elements.
*in monte carlo algorithms (iterative picks of p random elements from a 'set' of n elements).
HashMap has complexity O(1) both for insertion and removal.
You specify a lot of operation, but all of them are nothing else then insertion, removal and traversing:
can be initialized with an array of n objects in O(n).
n * O(1) insertion. HashMap is fine
one can obtain a random element in
O(1), after this operation the picked
element is removed from the structure.
(without replacement)
This is the only op that require O(n).
one can undo p 'picking without
replacement' operations in O(p)
it's an insertion operation: O(1).
one can remove a specific object (eg
by id) from the structure in O(log(n))
O(1).
one can obtain an array of the objects
currently in the structure in O(n).
you can traverse an HashMap in O(n)
EDIT:
example of picking up a random element in O(n):
HashMap map ....
int randomIntFromZeroToYouHashMapSize = ...
Collection collection = map.values();
Object[] values = collection.toArray();
values[randomIntFromZeroToYouHashMapSize];
Ok, same answer as 0verbose with a simple fix to get the O(1) random lookup. Create an array which stores the same n objects. Now, in the HashMap, store the pairs . For example, say your Objects (strings for simplicity) are:
{"abc" , "def", "ghi"}
Create an
List<String> array = ArrayList<String>("abc","def","ghi")
Create a HashMap map with the following values:
for (int i = 0; i < array.size(); i++)
{
map.put(array[i],i);
}
O(1) random lookup is easily achieved by picking any index in the array. The only complication that arises is when you delete an object. For that, do:
Find object in map. Get its array index. Lets call this index i (map.get(i)) - O(1)
Swap array[i] with array[size of array - 1] (the last element in the array). Reduce the size of the array by 1 (since there is one less number now) - O(1)
Update the index of the new object in position i of the array in map (map.put(array[i], i)) - O(1)
I apologize for the mix of java and cpp notation, hope this helps
Here's my analysis of using Collections.shuffle() on an ArrayList:
✔ can be initialized with an array of n objects in O(n).
Yes, although the cost is amortized unless n is known in advance.
✔ one can obtain a random element in O(1), after this operation the picked element is removed from the structure, without replacement.
Yes, choose the last element in the shuffled array; replace the array with a subList() of the remaining elements.
✔ one can undo p 'picking without replacement' operations in O(p).
Yes, append the element to the end of this list via add().
❍ one can remove a specific object (eg by id) from the structure in O(log(n)).
No, it looks like O(n).
✔ one can obtain an array of the objects currently in the structure in O(n).
Yes, using toArray() looks reasonable.
How about an array (or ArrayList) that's divided into "picked" and "unpicked"? You keep track of where the boundary is, and to pick, you generate a random index below the boundary, then (since you don't care about order), swap the item at that index with the last unpicked item, and decrement the boundary. To unpick, you just increment the boundary.
Update: Forgot about O(log(n)) removal. Not that hard, though, just a little memory-expensive, if you keep a HashMap of IDs to indices.
If you poke around on line you'll find various IndexedHashSet implementations that all work on more or less this principle -- an array or ArrayList plus a HashMap.
(I'd love to see a more elegant solution, though, if one exists.)
Update 2: Hmm... or does the actual removal become O(n) again, if you have to either recopy the arrays or shift them around?

Categories

Resources