What would be the practial usage of vectors? [duplicate] - java

I've always been one to simply use:
List<String> names = new ArrayList<>();
I use the interface as the type name for portability, so that when I ask questions such as this, I can rework my code.
When should LinkedList be used over ArrayList and vice-versa?

Summary ArrayList with ArrayDeque are preferable in many more use-cases than LinkedList. If you're not sure — just start with ArrayList.
TLDR, in ArrayList accessing an element takes constant time [O(1)] and adding an element takes O(n) time [worst case]. In LinkedList inserting an element takes O(n) time and accessing also takes O(n) time but LinkedList uses more memory than ArrayList.
LinkedList and ArrayList are two different implementations of the List interface. LinkedList implements it with a doubly-linked list. ArrayList implements it with a dynamically re-sizing array.
As with standard linked list and array operations, the various methods will have different algorithmic runtimes.
For LinkedList<E>
get(int index) is O(n) (with n/4 steps on average), but O(1) when index = 0 or index = list.size() - 1 (in this case, you can also use getFirst() and getLast()). One of the main benefits of LinkedList<E>
add(int index, E element) is O(n) (with n/4 steps on average), but O(1) when index = 0 or index = list.size() - 1 (in this case, you can also use addFirst() and addLast()/add()). One of the main benefits of LinkedList<E>
remove(int index) is O(n) (with n/4 steps on average), but O(1) when index = 0 or index = list.size() - 1 (in this case, you can also use removeFirst() and removeLast()). One of the main benefits of LinkedList<E>
Iterator.remove() is O(1). One of the main benefits of LinkedList<E>
ListIterator.add(E element) is O(1). One of the main benefits of LinkedList<E>
Note: Many of the operations need n/4 steps on average, constant number of steps in the best case (e.g. index = 0), and n/2 steps in worst case (middle of list)
For ArrayList<E>
get(int index) is O(1). Main benefit of ArrayList<E>
add(E element) is O(1) amortized, but O(n) worst-case since the array must be resized and copied
add(int index, E element) is O(n) (with n/2 steps on average)
remove(int index) is O(n) (with n/2 steps on average)
Iterator.remove() is O(n) (with n/2 steps on average)
ListIterator.add(E element) is O(n) (with n/2 steps on average)
Note: Many of the operations need n/2 steps on average, constant number of steps in the best case (end of list), n steps in the worst case (start of list)
LinkedList<E> allows for constant-time insertions or removals using iterators, but only sequential access of elements. In other words, you can walk the list forwards or backwards, but finding a position in the list takes time proportional to the size of the list. Javadoc says "operations that index into the list will traverse the list from the beginning or the end, whichever is closer", so those methods are O(n) (n/4 steps) on average, though O(1) for index = 0.
ArrayList<E>, on the other hand, allow fast random read access, so you can grab any element in constant time. But adding or removing from anywhere but the end requires shifting all the latter elements over, either to make an opening or fill the gap. Also, if you add more elements than the capacity of the underlying array, a new array (1.5 times the size) is allocated, and the old array is copied to the new one, so adding to an ArrayList is O(n) in the worst case but constant on average.
So depending on the operations you intend to do, you should choose the implementations accordingly. Iterating over either kind of List is practically equally cheap. (Iterating over an ArrayList is technically faster, but unless you're doing something really performance-sensitive, you shouldn't worry about this -- they're both constants.)
The main benefits of using a LinkedList arise when you re-use existing iterators to insert and remove elements. These operations can then be done in O(1) by changing the list locally only. In an array list, the remainder of the array needs to be moved (i.e. copied). On the other side, seeking in a LinkedList means following the links in O(n) (n/2 steps) for worst case, whereas in an ArrayList the desired position can be computed mathematically and accessed in O(1).
Another benefit of using a LinkedList arises when you add or remove from the head of the list, since those operations are O(1), while they are O(n) for ArrayList. Note that ArrayDeque may be a good alternative to LinkedList for adding and removing from the head, but it is not a List.
Also, if you have large lists, keep in mind that memory usage is also different. Each element of a LinkedList has more overhead since pointers to the next and previous elements are also stored. ArrayLists don't have this overhead. However, ArrayLists take up as much memory as is allocated for the capacity, regardless of whether elements have actually been added.
The default initial capacity of an ArrayList is pretty small (10 from Java 1.4 - 1.8). But since the underlying implementation is an array, the array must be resized if you add a lot of elements. To avoid the high cost of resizing when you know you're going to add a lot of elements, construct the ArrayList with a higher initial capacity.
If the data structures perspective is used to understand the two structures, a LinkedList is basically a sequential data structure which contains a head Node. The Node is a wrapper for two components : a value of type T [accepted through generics] and another reference to the Node linked to it. So, we can assert it is a recursive data structure (a Node contains another Node which has another Node and so on...). Addition of elements takes linear time in LinkedList as stated above.
An ArrayList is a growable array. It is just like a regular array. Under the hood, when an element is added, and the ArrayList is already full to capacity, it creates another array with a size which is greater than previous size. The elements are then copied from previous array to new one and the elements that are to be added are also placed at the specified indices.

Thus far, nobody seems to have addressed the memory footprint of each of these lists besides the general consensus that a LinkedList is "lots more" than an ArrayList so I did some number crunching to demonstrate exactly how much both lists take up for N null references.
Since references are either 32 or 64 bits (even when null) on their relative systems, I have included 4 sets of data for 32 and 64 bit LinkedLists and ArrayLists.
Note: The sizes shown for the ArrayList lines are for trimmed lists - In practice, the capacity of the backing array in an ArrayList is generally larger than its current element count.
Note 2: (thanks BeeOnRope) As CompressedOops is default now from mid JDK6 and up, the values below for 64-bit machines will basically match their 32-bit counterparts, unless of course you specifically turn it off.
The result clearly shows that LinkedList is a whole lot more than ArrayList, especially with a very high element count. If memory is a factor, steer clear of LinkedLists.
The formulas I used follow, let me know if I have done anything wrong and I will fix it up. 'b' is either 4 or 8 for 32 or 64 bit systems, and 'n' is the number of elements. Note the reason for the mods is because all objects in java will take up a multiple of 8 bytes space regardless of whether it is all used or not.
ArrayList:
ArrayList object header + size integer + modCount integer + array reference + (array oject header + b * n) + MOD(array oject, 8) + MOD(ArrayList object, 8) == 8 + 4 + 4 + b + (12 + b * n) + MOD(12 + b * n, 8) + MOD(8 + 4 + 4 + b + (12 + b * n) + MOD(12 + b * n, 8), 8)
LinkedList:
LinkedList object header + size integer + modCount integer + reference to header + reference to footer + (node object overhead + reference to previous element + reference to next element + reference to element) * n) + MOD(node object, 8) * n + MOD(LinkedList object, 8) == 8 + 4 + 4 + 2 * b + (8 + 3 * b) * n + MOD(8 + 3 * b, 8) * n + MOD(8 + 4 + 4 + 2 * b + (8 + 3 * b) * n + MOD(8 + 3 * b, 8) * n, 8)

ArrayList is what you want. LinkedList is almost always a (performance) bug.
Why LinkedList sucks:
It uses lots of small memory objects, and therefore impacts performance across the process.
Lots of small objects are bad for cache-locality.
Any indexed operation requires a traversal, i.e. has O(n) performance. This is not obvious in the source code, leading to algorithms O(n) slower than if ArrayList was used.
Getting good performance is tricky.
Even when big-O performance is the same as ArrayList, it is probably going to be significantly slower anyway.
It's jarring to see LinkedList in source because it is probably the wrong choice.

Algorithm ArrayList LinkedList
seek front O(1) O(1)
seek back O(1) O(1)
seek to index O(1) O(N)
insert at front O(N) O(1)
insert at back O(1) O(1)
insert after an item O(N) O(1)
Algorithms: Big-Oh Notation (archived)
ArrayLists are good for write-once-read-many or appenders, but bad at add/remove from the front or middle.

See 2021 update from author below the original answer.
Original answer (2011)
As someone who has been doing operational performance engineering on very large scale SOA web services for about a decade, I would prefer the behavior of LinkedList over ArrayList. While the steady-state throughput of LinkedList is worse and therefore might lead to buying more hardware -- the behavior of ArrayList under pressure could lead to apps in a cluster expanding their arrays in near synchronicity and for large array sizes could lead to lack of responsiveness in the app and an outage, while under pressure, which is catastrophic behavior.
Similarly, you can get better throughput in an app from the default throughput tenured garbage collector, but once you get java apps with 10GB heaps you can wind up locking up the app for 25 seconds during a Full GCs which causes timeouts and failures in SOA apps and blows your SLAs if it occurs too often. Even though the CMS collector takes more resources and does not achieve the same raw throughput, it is a much better choice because it has more predictable and smaller latency.
ArrayList is only a better choice for performance if all you mean by performance is throughput and you can ignore latency. In my experience at my job I cannot ignore worst-case latency.
Update (Aug 27, 2021 -- 10 years later)
This answer (my most historically upvoted answer on SO as well) is very likely wrong (for reasons outlined in the comments below). I'd like to add that ArrayList will optimize for sequential reading of memory and minimize cache-line and TLB misses, etc. The copying overhead when the array grows past the bounds is likely inconsequential by comparison (and can be done by efficient CPU operations). This answer is also probably getting worse over time given hardware trends. The only situations where a LinkedList might make sense would be something highly contrived where you had thousands of Lists any one of which might grow to be GB-sized, but where no good guess could be made at allocation-time of the List and setting them all to GB-sized would blow up the heap. And if you found some problem like that, then it really does call for reengineering whatever your solution is (and I don't like to lightly suggest reengineering old code because I myself maintain piles and piles of old code, but that'd be a very good case of where the original design has simply run out of runway and does need to be chucked). I'll still leave my decades-old poor opinion up there for you to read though. Simple, logical and pretty wrong.

Joshua Bloch, the author of LinkedList:
Does anyone actually use LinkedList? I wrote it, and I never use it.
Link: https://twitter.com/joshbloch/status/583813919019573248
I'm sorry for the answer not being as informative as the other answers, but I thought it would be the most self-explanatory if not revealing.

Yeah, I know, this is an ancient question, but I'll throw in my two cents:
LinkedList is almost always the wrong choice, performance-wise. There are some very specific algorithms where a LinkedList is called for, but those are very, very rare and the algorithm will usually specifically depend on LinkedList's ability to insert and delete elements in the middle of the list relatively quickly, once you've navigated there with a ListIterator.
There is one common use case in which LinkedList outperforms ArrayList: that of a queue. However, if your goal is performance, instead of LinkedList you should also consider using an ArrayBlockingQueue (if you can determine an upper bound on your queue size ahead of time, and can afford to allocate all the memory up front), or this CircularArrayList implementation. (Yes, it's from 2001, so you'll need to generify it, but I got comparable performance ratios to what's quoted in the article just now in a recent JVM)

It's an efficiency question. LinkedList is fast for adding and deleting elements, but slow to access a specific element. ArrayList is fast for accessing a specific element but can be slow to add to either end, and especially slow to delete in the middle.
Array vs ArrayList vs LinkedList vs Vector goes more in depth, as does
Linked List.

Correct or Incorrect: Please execute test locally and decide for yourself!
Edit/Remove is faster in LinkedList than ArrayList.
ArrayList, backed by Array, which needs to be double the size, is worse in large volume application.
Below is the unit test result for each operation.Timing is given in Nanoseconds.
Operation ArrayList LinkedList
AddAll (Insert) 101,16719 2623,29291
Add (Insert-Sequentially) 152,46840 966,62216
Add (insert-randomly) 36527 29193
remove (Delete) 20,56,9095 20,45,4904
contains (Search) 186,15,704 189,64,981
Here's the code:
import org.junit.Assert;
import org.junit.Test;
import java.util.*;
public class ArrayListVsLinkedList {
private static final int MAX = 500000;
String[] strings = maxArray();
////////////// ADD ALL ////////////////////////////////////////
#Test
public void arrayListAddAll() {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> arrayList = new ArrayList<String>(MAX);
watch.start();
arrayList.addAll(stringList);
watch.totalTime("Array List addAll() = ");//101,16719 Nanoseconds
}
#Test
public void linkedListAddAll() throws Exception {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
watch.start();
List<String> linkedList = new LinkedList<String>();
linkedList.addAll(stringList);
watch.totalTime("Linked List addAll() = "); //2623,29291 Nanoseconds
}
//Note: ArrayList is 26 time faster here than LinkedList for addAll()
///////////////// INSERT /////////////////////////////////////////////
#Test
public void arrayListAdd() {
Watch watch = new Watch();
List<String> arrayList = new ArrayList<String>(MAX);
watch.start();
for (String string : strings)
arrayList.add(string);
watch.totalTime("Array List add() = ");//152,46840 Nanoseconds
}
#Test
public void linkedListAdd() {
Watch watch = new Watch();
List<String> linkedList = new LinkedList<String>();
watch.start();
for (String string : strings)
linkedList.add(string);
watch.totalTime("Linked List add() = "); //966,62216 Nanoseconds
}
//Note: ArrayList is 9 times faster than LinkedList for add sequentially
/////////////////// INSERT IN BETWEEN ///////////////////////////////////////
#Test
public void arrayListInsertOne() {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> arrayList = new ArrayList<String>(MAX + MAX / 10);
arrayList.addAll(stringList);
String insertString0 = getString(true, MAX / 2 + 10);
String insertString1 = getString(true, MAX / 2 + 20);
String insertString2 = getString(true, MAX / 2 + 30);
String insertString3 = getString(true, MAX / 2 + 40);
watch.start();
arrayList.add(insertString0);
arrayList.add(insertString1);
arrayList.add(insertString2);
arrayList.add(insertString3);
watch.totalTime("Array List add() = ");//36527
}
#Test
public void linkedListInsertOne() {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> linkedList = new LinkedList<String>();
linkedList.addAll(stringList);
String insertString0 = getString(true, MAX / 2 + 10);
String insertString1 = getString(true, MAX / 2 + 20);
String insertString2 = getString(true, MAX / 2 + 30);
String insertString3 = getString(true, MAX / 2 + 40);
watch.start();
linkedList.add(insertString0);
linkedList.add(insertString1);
linkedList.add(insertString2);
linkedList.add(insertString3);
watch.totalTime("Linked List add = ");//29193
}
//Note: LinkedList is 3000 nanosecond faster than ArrayList for insert randomly.
////////////////// DELETE //////////////////////////////////////////////////////
#Test
public void arrayListRemove() throws Exception {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> arrayList = new ArrayList<String>(MAX);
arrayList.addAll(stringList);
String searchString0 = getString(true, MAX / 2 + 10);
String searchString1 = getString(true, MAX / 2 + 20);
watch.start();
arrayList.remove(searchString0);
arrayList.remove(searchString1);
watch.totalTime("Array List remove() = ");//20,56,9095 Nanoseconds
}
#Test
public void linkedListRemove() throws Exception {
Watch watch = new Watch();
List<String> linkedList = new LinkedList<String>();
linkedList.addAll(Arrays.asList(strings));
String searchString0 = getString(true, MAX / 2 + 10);
String searchString1 = getString(true, MAX / 2 + 20);
watch.start();
linkedList.remove(searchString0);
linkedList.remove(searchString1);
watch.totalTime("Linked List remove = ");//20,45,4904 Nanoseconds
}
//Note: LinkedList is 10 millisecond faster than ArrayList while removing item.
///////////////////// SEARCH ///////////////////////////////////////////
#Test
public void arrayListSearch() throws Exception {
Watch watch = new Watch();
List<String> stringList = Arrays.asList(strings);
List<String> arrayList = new ArrayList<String>(MAX);
arrayList.addAll(stringList);
String searchString0 = getString(true, MAX / 2 + 10);
String searchString1 = getString(true, MAX / 2 + 20);
watch.start();
arrayList.contains(searchString0);
arrayList.contains(searchString1);
watch.totalTime("Array List addAll() time = ");//186,15,704
}
#Test
public void linkedListSearch() throws Exception {
Watch watch = new Watch();
List<String> linkedList = new LinkedList<String>();
linkedList.addAll(Arrays.asList(strings));
String searchString0 = getString(true, MAX / 2 + 10);
String searchString1 = getString(true, MAX / 2 + 20);
watch.start();
linkedList.contains(searchString0);
linkedList.contains(searchString1);
watch.totalTime("Linked List addAll() time = ");//189,64,981
}
//Note: Linked List is 500 Milliseconds faster than ArrayList
class Watch {
private long startTime;
private long endTime;
public void start() {
startTime = System.nanoTime();
}
private void stop() {
endTime = System.nanoTime();
}
public void totalTime(String s) {
stop();
System.out.println(s + (endTime - startTime));
}
}
private String[] maxArray() {
String[] strings = new String[MAX];
Boolean result = Boolean.TRUE;
for (int i = 0; i < MAX; i++) {
strings[i] = getString(result, i);
result = !result;
}
return strings;
}
private String getString(Boolean result, int i) {
return String.valueOf(result) + i + String.valueOf(!result);
}
}

ArrayList is essentially an array. LinkedList is implemented as a double linked list.
The get is pretty clear. O(1) for ArrayList, because ArrayList allow random access by using index. O(n) for LinkedList, because it needs to find the index first. Note: there are different versions of add and remove.
LinkedList is faster in add and remove, but slower in get. In brief, LinkedList should be preferred if:
there are no large number of random access of element
there are a large number of add/remove operations
=== ArrayList ===
add(E e)
add at the end of ArrayList
require memory resizing cost.
O(n) worst, O(1) amortized
add(int index, E element)
add to a specific index position
require shifting & possible memory resizing cost
O(n)
remove(int index)
remove a specified element
require shifting & possible memory resizing cost
O(n)
remove(Object o)
remove the first occurrence of the specified element from this list
need to search the element first, and then shifting & possible memory resizing cost
O(n)
=== LinkedList ===
add(E e)
add to the end of the list
O(1)
add(int index, E element)
insert at specified position
need to find the position first
O(n)
remove()
remove first element of the list
O(1)
remove(int index)
remove element with specified index
need to find the element first
O(n)
remove(Object o)
remove the first occurrence of the specified element
need to find the element first
O(n)
Here is a figure from programcreek.com (add and remove are the first type, i.e., add an element at the end of the list and remove the element at the specified position in the list.):

TL;DR due to modern computer architecture, ArrayList will be significantly more efficient for nearly any possible use-case - and therefore LinkedList should be avoided except some very unique and extreme cases.
In theory, LinkedList has an O(1) for the add(E element)
Also adding an element in the mid of a list should be very efficient.
Practice is very different, as LinkedList is a Cache Hostile Data structure. From performance POV - there are very little cases where LinkedList could be better performing than the Cache-friendly ArrayList.
Here are results of a benchmark testing inserting elements in random locations. As you can see - the array list if much more efficient, although in theory each insert in the middle of the list will require "move" the n later elements of the array (lower values are better):
Working on a later generation hardware (bigger, more efficient caches) - the results are even more conclusive:
LinkedList takes much more time to accomplish the same job. source Source Code
There are two main reasons for this:
Mainly - that the nodes of the LinkedList are scattered randomly across the memory. RAM ("Random Access Memory") isn't really random and blocks of memory need to be fetched to cache. This operation takes time, and when such fetches happen frequently - the memory pages in the cache need to be replaced all the time -> Cache misses -> Cache is not efficient.
ArrayList elements are stored on continuous memory - which is exactly what the modern CPU architecture is optimizing for.
Secondary LinkedList required to hold back/forward pointers, which means 3 times the memory consumption per value stored compared to ArrayList.
DynamicIntArray, btw, is a custom ArrayList implementation holding Int (primitive type) and not Objects - hence all data is really stored adjacently - hence even more efficient.
A key elements to remember is that the cost of fetching memory block, is more significant than the cost accessing a single memory cell. That's why reader 1MB of sequential memory is up to x400 times faster than reading this amount of data from different blocks of memory:
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns 3 us
Send 1K bytes over 1 Gbps network 10,000 ns 10 us
Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD
Read 1 MB sequentially from memory 250,000 ns 250 us
Round trip within same datacenter 500,000 ns 500 us
Read 1 MB sequentially from SSD* 1,000,000 ns 1,000 us 1 ms ~1GB/sec SSD, 4X memory
Disk seek 10,000,000 ns 10,000 us 10 ms 20x datacenter roundtrip
Read 1 MB sequentially from disk 20,000,000 ns 20,000 us 20 ms 80x memory, 20X SSD
Send packet CA->Netherlands->CA 150,000,000 ns 150,000 us 150 ms
Source: Latency Numbers Every Programmer Should Know
Just to make the point even clearer, please check the benchmark of adding elements to the beginning of the list. This is a use-case where, in-theory, the LinkedList should really shine, and ArrayList should present poor or even worse-case results:
Note: this is a benchmark of the C++ Std lib, but my previous experience shown the C++ and Java results are very similar. Source Code
Copying a sequential bulk of memory is an operation optimized by the modern CPUs - changing theory and actually making, again, ArrayList/Vector much more efficient
Credits: All benchmarks posted here are created by Kjell Hedström. Even more data can be found on his blog

ArrayList is randomly accessible, while LinkedList is really cheap to expand and remove elements from. For most cases, ArrayList is fine.
Unless you've created large lists and measured a bottleneck, you'll probably never need to worry about the difference.

You can use one over the other based on the time complexities of the operations that you'd perform on that particular List.
|---------------------|---------------------|--------------------|------------|
| Operation | ArrayList | LinkedList | Winner |
|---------------------|---------------------|--------------------|------------|
| get(index) | O(1) | O(n) | ArrayList |
| | | n/4 steps in avg | |
|---------------------|---------------------|--------------------|------------|
| add(E) | O(1) | O(1) | LinkedList |
| |---------------------|--------------------| |
| | O(n) in worst case | | |
|---------------------|---------------------|--------------------|------------|
| add(index, E) | O(n) | O(n) | LinkedList |
| | n/2 steps | n/4 steps | |
| |---------------------|--------------------| |
| | | O(1) if index = 0 | |
|---------------------|---------------------|--------------------|------------|
| remove(index, E) | O(n) | O(n) | LinkedList |
| |---------------------|--------------------| |
| | n/2 steps | n/4 steps | |
|---------------------|---------------------|--------------------|------------|
| Iterator.remove() | O(n) | O(1) | LinkedList |
| ListIterator.add() | | | |
|---------------------|---------------------|--------------------|------------|
|--------------------------------------|-----------------------------------|
| ArrayList | LinkedList |
|--------------------------------------|-----------------------------------|
| Allows fast read access | Retrieving element takes O(n) |
|--------------------------------------|-----------------------------------|
| Adding an element require shifting | o(1) [but traversing takes time] |
| all the later elements | |
|--------------------------------------|-----------------------------------|
| To add more elements than capacity |
| new array need to be allocated |
|--------------------------------------|

If your code has add(0) and remove(0), use a LinkedList and it's prettier addFirst() and removeFirst() methods. Otherwise, use ArrayList.
And of course, Guava's ImmutableList is your best friend.

Let's compare LinkedList and ArrayList w.r.t. below parameters:
1. Implementation
ArrayList is the resizable array implementation of list interface , while
LinkedList is the Doubly-linked list implementation of the list interface.
2. Performance
get(int index) or search operation
ArrayList get(int index) operation runs in constant time i.e O(1) while
LinkedList get(int index) operation run time is O(n) .
The reason behind ArrayList being faster than LinkedList is that ArrayList uses an index based system for its elements as it internally uses an array data structure, on the other hand,
LinkedList does not provide index-based access for its elements as it iterates either from the beginning or end (whichever is closer) to retrieve the node at the specified element index.
insert() or add(Object) operation
Insertions in LinkedList are generally fast as compare to ArrayList. In LinkedList adding or insertion is O(1) operation .
While in ArrayList, if the array is the full i.e worst case, there is an extra cost of resizing array and copying elements to the new array, which makes runtime of add operation in ArrayList O(n), otherwise it is O(1).
remove(int) operation
Remove operation in LinkedList is generally the same as ArrayList i.e. O(n).
In LinkedList, there are two overloaded remove methods. one is remove() without any parameter which removes the head of the list and runs in constant time O(1). The other overloaded remove method in LinkedList is remove(int) or remove(Object) which removes the Object or int passed as a parameter. This method traverses the LinkedList until it found the Object and unlink it from the original list. Hence this method runtime is O(n).
While in ArrayList remove(int) method involves copying elements from the old array to new updated array, hence its runtime is O(n).
3. Reverse Iterator
LinkedList can be iterated in reverse direction using descendingIterator() while
there is no descendingIterator() in ArrayList , so we need to write our own code to iterate over the ArrayList in reverse direction.
4. Initial Capacity
If the constructor is not overloaded, then ArrayList creates an empty list of initial capacity 10, while
LinkedList only constructs the empty list without any initial capacity.
5. Memory Overhead
Memory overhead in LinkedList is more as compared to ArrayList as a node in LinkedList needs to maintain the addresses of the next and previous node. While
In ArrayList each index only holds the actual object(data).
Source

I know this is an old post, but I honestly can't believe nobody mentioned that LinkedList implements Deque. Just look at the methods in Deque (and Queue); if you want a fair comparison, try running LinkedList against ArrayDeque and do a feature-for-feature comparison.

Here is the Big-O notation in both ArrayList and LinkedList and also CopyOnWrite-ArrayList:
ArrayList
get O(1)
add O(1)
contains O(n)
next O(1)
remove O(n)
iterator.remove O(n)
LinkedList
get O(n)
add O(1)
contains O(n)
next O(1)
remove O(1)
iterator.remove O(1)
CopyOnWrite-ArrayList
get O(1)
add O(n)
contains O(n)
next O(1)
remove O(n)
iterator.remove O(n)
Based on these you have to decide what to choose. :)

In addition to the other good arguments above, you should notice ArrayList implements RandomAccess interface, while LinkedList implements Queue.
So, somehow they address slightly different problems, with difference of efficiency and behavior (see their list of methods).

It depends upon what operations you will be doing more on the List.
ArrayList is faster to access an indexed value. It is much worse when inserting or deleting objects.
To find out more, read any article that talks about the difference between arrays and linked lists.

See the Java Tutorials - List Implementations.

An array list is essentially an array with methods to add items etc. (and you should use a generic list instead). It is a collection of items which can be accessed through an indexer (for example [0]). It implies a progression from one item to the next.
A linked list specifies a progression from one item to the next (Item a -> item b). You can get the same effect with an array list, but a linked list absolutely says what item is supposed to follow the previous one.

An important feature of a linked list (which I didn't read in another answer) is the concatenation of two lists. With an array this is O(n) (+ overhead of some reallocations) with a linked list this is only O(1) or O(2) ;-)
Important: For Java its LinkedList this is not true! See Is there a fast concat method for linked list in Java?

ArrayList and LinkedList have their own pros and cons.
ArrayList uses contiguous memory address compared to LinkedList which uses pointers toward the next node. So when you want to look up an element in an ArrayList is faster than doing n iterations with LinkedList.
On the other hand, insertion and deletion in a LinkedList are much easier because you just have to change the pointers whereas an ArrayList implies the use of shift operation for any insertion or deletion.
If you have frequent retrieval operations in your app use an ArrayList. If you have frequent insertion and deletion use a LinkedList.

1) Underlying Data Structure
The first difference between ArrayList and LinkedList comes with the fact that ArrayList is backed by Array while LinkedList is backed by LinkedList. This will lead to further differences in performance.
2) LinkedList implements Deque
Another difference between ArrayList and LinkedList is that apart from the List interface, LinkedList also implements Deque interface, which provides first in first out operations for add() and poll() and several other Deque functions. 3) Adding elements in ArrayList Adding element in ArrayList is O(1) operation if it doesn't trigger re-size of Array, in which case it becomes O(log(n)), On the other hand, appending an element in LinkedList is O(1) operation, as it doesn't require any navigation.
4) Removing an element from a position
In order to remove an element from a particular index e.g. by calling remove(index), ArrayList performs a copy operation which makes it close to O(n) while LinkedList needs to traverse to that point which also makes it O(n/2), as it can traverse from either direction based upon proximity.
5) Iterating over ArrayList or LinkedList
Iteration is the O(n) operation for both LinkedList and ArrayList where n is a number of an element.
6) Retrieving element from a position
The get(index) operation is O(1) in ArrayList while its O(n/2) in LinkedList, as it needs to traverse till that entry. Though, in Big O notation O(n/2) is just O(n) because we ignore constants there.
7) Memory
LinkedList uses a wrapper object, Entry, which is a static nested class for storing data and two nodes next and previous while ArrayList just stores data in Array.
So memory requirement seems less in the case of ArrayList than LinkedList except for the case where Array performs the re-size operation when it copies content from one Array to another.
If Array is large enough it may take a lot of memory at that point and trigger Garbage collection, which can slow response time.
From all the above differences between ArrayList vs LinkedList, It looks ArrayList is the better choice than LinkedList in almost all cases, except when you do a frequent add() operation than remove(), or get().
It's easier to modify a linked list than ArrayList, especially if you are adding or removing elements from start or end because linked list internally keeps references of those positions and they are accessible in O(1) time.
In other words, you don't need to traverse through the linked list to reach the position where you want to add elements, in that case, addition becomes O(n) operation. For example, inserting or deleting an element in the middle of a linked list.
In my opinion, use ArrayList over LinkedList for most of the practical purpose in Java.

I have read the responses, but there is one scenario where I always use a LinkedList over an ArrayList that I want to share to hear opinions:
Every time I had a method that returns a list of data obtained from a DB I always use a LinkedList.
My rationale was that because it is impossible to know exactly how many results am I getting, there will be not memory wasted (as in ArrayList with the difference between the capacity and actual number of elements), and there would be no time wasted trying to duplicate the capacity.
As far a ArrayList, I agree that at least you should always use the constructor with the initial capacity, to minimize the duplication of the arrays as much as possible.

ArrayList and LinkedList both implements List interface and their methods and results are almost identical. However there are few differences between them which make one better over another depending on the requirement.
ArrayList Vs LinkedList
1) Search: ArrayList search operation is pretty fast compared to the LinkedList search operation. get(int index) in ArrayList gives the performance of O(1) while LinkedList performance is O(n).
Reason: ArrayList maintains index based system for its elements as it uses array data structure implicitly which makes it faster for searching an element in the list. On the other side LinkedList implements doubly linked list which requires the traversal through all the elements for searching an element.
2) Deletion: LinkedList remove operation gives O(1) performance while ArrayList gives variable performance: O(n) in worst case (while removing first element) and O(1) in best case (While removing last element).
Conclusion: LinkedList element deletion is faster compared to
ArrayList.
Reason: LinkedList’s each element maintains two pointers (addresses) which points to the both neighbor elements in the list. Hence removal only requires change in the pointer location in the two neighbor nodes (elements) of the node which is going to be removed. While In ArrayList all the elements need to be shifted to fill out the space created by removed element.
3) Inserts Performance: LinkedList add method gives O(1) performance while ArrayList gives O(n) in worst case. Reason is same as explained for remove.
4) Memory Overhead: ArrayList maintains indexes and element data while LinkedList maintains element data and two pointers for neighbor nodes
hence the memory consumption is high in LinkedList comparatively.
There are few similarities between these classes which are as follows:
Both ArrayList and LinkedList are implementation of List interface.
They both maintain the elements insertion order which means while displaying ArrayList and LinkedList elements the result set would be having the same order in which the elements got inserted into the List.
Both these classes are non-synchronized and can be made synchronized explicitly by using Collections.synchronizedList method.
The iterator and listIterator returned by these classes are fail-fast (if list is structurally modified at any time after the iterator is created, in any way except through the iterator’s own remove or add methods, the iterator will throw a ConcurrentModificationException).
When to use LinkedList and when to use ArrayList?
As explained above the insert and remove operations give good performance (O(1)) in LinkedList compared to ArrayList(O(n)).
Hence if there is a requirement of frequent addition and deletion in application then LinkedList is a best choice.
Search (get method) operations are fast in Arraylist (O(1)) but not in LinkedList (O(n))
so If there are less add and remove operations and more search operations requirement, ArrayList would be your best bet.

Operation get(i) in ArrayList is faster than LinkedList, because:
ArrayList: Resizable-array implementation of the List interface
LinkedList: Doubly-linked list implementation of the List and Deque interfaces
Operations that index into the list will traverse the list from the beginning or the end, whichever is closer to the specified index.

Both remove() and insert() have a runtime efficiency of O(n) for both ArrayLists and LinkedLists. However, the reason behind the linear processing time comes from two very different reasons:
In an ArrayList, you get to the element in O(1), but actually removing or inserting something makes it O(n) because all the following elements need to be changed.
In a LinkedList, it takes O(n) to actually get to the desired element, because we have to start at the very beginning until we reach the desired index. Actually removing or inserting is constant, because we only have to change 1 reference for remove() and 2 references for insert().
Which of the two is faster for inserting and removing depends on where it happens. If we are closer to the beginning the LinkedList will be faster, because we have to go through relatively few elements. If we are closer to the end an ArrayList will be faster, because we get there in constant time and only have to change the few remaining elements that follow it. When done precisely in the middle the LinkedList will be faster because going through n elements is quicker than moving n values.
Bonus: While there is no way of making these two methods O(1) for an ArrayList, there actually is a way to do this in LinkedLists. Let's say we want to go through the entire List removing and inserting elements on our way. Usually, you would start from the very beginning for each element using the LinkedList, we could also "save" the current element we're working on with an Iterator. With the help of the Iterator, we get an O(1) efficiency for remove() and insert() when working in a LinkedList. Making it the only performance benefit I'm aware of where a LinkedList is always better than an ArrayList.

One of the tests I saw on here only conducts the test once. But what I have noticed is that you need to run these tests many times and eventually their times will converge. Basically the JVM needs to warm up. For my particular use case I needed to add/remove items to a list that grows to about 500 items. In my tests LinkedList came out faster, with LinkedList coming in around 50,000 NS and ArrayList coming in at around 90,000 NS... give or take. See the code below.
public static void main(String[] args) {
List<Long> times = new ArrayList<>();
for (int i = 0; i < 100; i++) {
times.add(doIt());
}
System.out.println("avg = " + (times.stream().mapToLong(x -> x).average()));
}
static long doIt() {
long start = System.nanoTime();
List<Object> list = new LinkedList<>();
//uncomment line below to test with ArrayList
//list = new ArrayList<>();
for (int i = 0; i < 500; i++) {
list.add(i);
}
Iterator it = list.iterator();
while (it.hasNext()) {
it.next();
it.remove();
}
long end = System.nanoTime();
long diff = end - start;
//uncomment to see the JVM warmup and get faster for the first few iterations
//System.out.println(diff)
return diff;
}

ArrayList extends AbstractList and implements the List Interface. ArrayList is dynamic array.It can be said that it was basically created to overcome the drawbacks of arrays
The LinkedList class extends AbstractSequentialList and implements List,Deque, and Queue interface.
Performance
arraylist.get() is O(1) whereas linkedlist.get() is O(n)
arraylist.add() is O(1) and linkedlist.add() is 0(1)
arraylist.contains() is O(n) andlinkedlist.contains() is O(n)
arraylist.next() is O(1) and linkedlist.next() is O(1)
arraylist.remove() is O(n) whereas linkedlist.remove() is O(1)
In arraylistiterator.remove() is O(n) whereas In linkedlist iterator.remove()is O(1)

Related

Time complexity of LinkedList.subList(int, int)

If I have a linked list of objects and I want the sublist from index 2 to 5. Is this operation o(1)? All you would need to do is null the reference to prev on the node at index 2, and return the node at index 2, correct? Does this require copying the contents of the linked list into another one and returning that or just setting the head to be the node at index 2?
Is this operation o(1)?
In general, getting a sublist of a linked list is O(k), not O(1)*.
However, for any specific index value, such as 2, 5, or 5000, any operation is O(1), because a specific index becomes a constant that gets factored out from Big-O notation.
* Construction of sublist may be optimized such that you pay construction costs on the first navigation of sublist, rather on construction. In other words, constructing a sublist without iterating it is O(1).
It appears that the sublist method runs in O(1) time. See the source code for the method.
All that this code does is return a new instance of SubList that is initialized with the list that sublist is invoked upon. No iteration is happening here, so the operation runs in constant time.
It's O(n) if you consider the general case of the algorithm. Even if you do what you said finding the n-th and m-th element would take a complete traversal of the list.
It's wrong to consider that the finding the sublist from 2 to 5 would be O(1). It is O(1) ofcourse but it needs constant amount of operations to do that but are you creating an algorithm for sublist(2,5)? If you do that then ofcoursse it is always O(1).
A better example is sorting 100 numbers is of O(1) complexity so is sorting 10,000 numbers. But it is not what we are concerned about. We want to know the nature of the algorithm based on the inputs fed to that algorithm.

ArrayList: insertion vs insertion at specified element

Consider an Arraylist. Internally it is not full, and the number of elements inserted so far is known. The elements are not sorted.
Choose the operations listed below that are fast regardless of the number of elements contained in the ArrayList. (In other words, takes only several instructions to implement).
Insertion
Insertion at a given index
Getting the data from a specified index
Finding the maximum value in an array of integers (not necessarily sorted)
Deletion at the given index
Replacing an element at a specified index
Searching for a specific element
I chose Insertion at specified index, Getting the data from a specified index, and replacing an element but answer key says Insertion. As I usually understand it, in an ArrayList, the insert operation requires all of the elements to shift left. If we did this at the beginning of the list, we would have $O(n)$ time complexity. However, if we did it at the end, it would be $O(1)$.
My question comes down to: (1) what, if any, difference is there between insertion and insertion at specified index and (2) given this particular time complexity for insertion why is it considered "fast"
First take a look at these two methods defined in java.util.ArrayList
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
public void add(int index, E element) {
rangeCheckForAdd(index);
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
elementData[index] = element;
size++;
}
Now if you see the first method (just adding element), it just ensures whether there's sufficient capacity and appends element to the last of the list.
So if there's sufficient capacity, then this operation would require O(1) time complexity, otherwise it would require shifting of all elements and ultimately time complexity increases to O(n).
In the second method, when you specify index, all the elements after that index would be shifted and this would definitely take more time then former.
For the first question the answer is this:
Insertion at a specified index i takes O(n), since all the elements following i will have to be shifted to right with one position.
On the other hand, simple insertion as implemented in Java (ArrayList.add()) will only take O(1) because the element is appended to the end of the array, so no shift is required.
For the second question, it is obvious why simple insertion is fast: no extra operation is needed, so time is constant.
ArrayList internally is nothing but an Array itself which uses Array.copyOf to create a new Array with increased size,upon add,but with original content intact.
So about insertion, whether you do a simple add (which will add the data at the end of the array) or on ,say, first(0th) index , it will still be faster then most data structures , keeping in mind the simplicity of the Data Structures.
The only difference is that simple add require no traversal but adding at index require shifting of elements to the left, similarly for delete. That uses System.arrayCopy to copy one array to another with alteration in index and the data.
So ,yeah simple insertion is faster then indexed insertion.
(1) what, if any, difference is there between insertion and insertion at specified index and
An ArrayList stores it's elements consecutively. Adding to the end of the ArrayList does not require the ArrayList to be altered in any way except for adding the new element to the end of itself. Thus, this operation is O(1), taking constant time which is favorable when wanting to perform an action repetitively in a data structure.
Adding an element to an index, however, requires the ArrayList to make room for the element in some way. How is that done? Every element following the inserted element will have to be moved one step to make room for the new insertion. Your index is anything in between the first element and and the nth element (inclusively). This operation thus is O(1) at best and O(n) at worst where n is the size of the array. For large lists, O(n) takes significantly longer time than O(1).
(2) given this particular time complexity for insertion why is it considered "fast"
It is considered fast because it is O(1), or constant time. If the time complexity is truly only one operation, it is as fast as it can possibly be, other small constants are also regarded fast and are often equally notated by O(1), where the "1" does not mean one single operation strictly, but that the amount of operations does not depend on the size of something else, in your example it would be the size of the ArrayList. However, constant time complexity can involve large constants as well, but in general is regarded as the fastest as possible time complexity. To put this into context, an O(1) operations takes roughly 1 * k operations in an ArrayList with 1000 elements, while a O(n) operation takes roughly 1000 * k operations, where k is some constant.
Big-O notation is used as a metric to measure how many operations an action or a whole programs will execute when they are run.
For more information about big O-notation:
What is a plain English explanation of "Big O" notation?

Algorithm Complexity: Is it the same to iterate an array from the start than from the end?

In an interview I was asked for the following:
public class Main {
public static void main(String[] args) {
// TODO Auto-generated method stub
int [] array = new int [10000];
for (int i = 0; i < array.length; i++) {
// do calculations
}
for (int x = array.length-1; x >= 0; x--) {
// do calculations
}
}
}
Is it the same to iterate an array either from the end or from the start? As my understanding it would be the same since complexity is constant i.e O(1) ? Am I correct?
Also I was asked regarding ArrayList Complexity compared to other collections in java, for example, LinkedList.
Thank you.
There can be a difference due to CPU prefetch characteristics.
There is no difference between looping in either direction as per computational theory. However, depending on the kind of prefetcher that is used by the CPU on which the code runs, there will be some differences in practice.
For example, the Sandy Bridge Intel processor has a prefetcher that goes forward only for data (while instructions could be prefetched in both directions). This will help iteration from the start (as future memory locations are prefetched into the L1 cache), while iterating from the end will cause very little to no prefetching, and hence more accesses to RAM which is much slower than accessing any of the CPU caches.
There is a more detailed discussion about forward and backward prefetching at this link.
It's O(n) in both cases for an array, as there are n iterations and each step takes O(1) (assuming the calculations in the loop take O(1)). In particular, obtaining the length or size is typically an O(1) operation for arrays or ArrayList.
A typical use case for iterating from the end is removing elements in the loop (which may otherwise require more complex accounting to avoid skipping elements or iterating beyond the end).
For a linked list, the first loop would typically be O(n²), as determining the length of a linked list is typically an O(n) operation without additional caching, and it's used every time the exit condition is checked. However, java.util.LinkedList keeps explicitly track of the length, so the total is O(n) for linked lists in java.
If an element in a linked list is accessed using the index in the calculations, this will be an O(n) operation, yielding a total of O(n²).
Is it the same to iterate an array either from the end or from the start? As my understanding it would be the same since complexity is constant i.e O(1) ? Am I correct?
In theory, yes it's the same to iterate an array from the end and from the start.
The time complexity is O(10,000) which is constant so O(1) assuming that the loop body has a constant time complexity. But it's nice to mention that the constant 10,000 can be promoted to a variable, call it N, and then you can say that the time complexity is O(N).
Also I was asked regarding ArrayList Complexity compared to other collections in java, for example, LinkedList.
Here you can find comparison between ArrayList and LinkedList time complexity. The interesting methods are add, remove and get.
http://www.programcreek.com/2013/03/arraylist-vs-linkedlist-vs-vector/
Also, data in a LinkedList are not stored consuecutively. However, data in an ArrayList are stored consecutively and also an ArrayList uses less space than a LinkedList
Good luck!

Randomizing list processing faster than Collections.shuffle()?

I am developing an agent-based model in Java. I have used a profiler to reduce any inefficiencies down to the point that the only thing holding it back is Java's Collections.shuffle().
The agents (they're animals) in my model need to be processed in a random order so that no agent is consistently processed before the others.
I am looking for: Either a faster way to shuffle than Java's Collections.shuffle() or an alternative method of processing the elements in an ArrayList in a randomized order that is significantly faster. If you know of a data structure that would be faster than an ArrayList, by all means please answer. I have considered LinkedList and ArrayDeque, but they aren't making much of a difference.
Currently, I have over 1,000,000 elements in the list I am trying to shuffle. Over time, this amount increases and it is becoming increasingly inefficient to shuffle it.
Is there an alternative data structure or way of randomizing the processing of elements that is faster?
I only need to be able to store elements and process them in a randomized order. I do not use contains or anything more complex than storage and iterating over them.
Here is some sample code to better explain what I am trying to achieve:
UPDATE: Sorry for the ConcurrentModificationException, I didn't realize I had done that and I didn't intend to confuse anyone. Fixed it in the code below.
ArrayList<Agent> list = new ArrayList<>();
void process()
{
list.add(new Agent("Zebra"));
Random r = new Random();
for (int i = 0; i < 100000; i++)
{
ArrayList<Agent> newlist = new ArrayList<>();
Collections.shuffle(list);//Something that will allow the order to be random (random quality does not matter to me), yet faster than a shuffle
for (String str : list)
{
newlist.add(str);
if(r.nextDouble() > 0.99)//1% chance of adding another agent to the list
{
newlist.add(new Agent("Lion"));
}
}
list = newlist;
}
}
ANOTHER UPDATE
I thought about doing list.remove(rando.nextInt(list.size()) but since remove for ArrayLists is O(n) it would be even worse to do that rather than shuffle for such a large list size.
I would use a simple ArrayList and not shuffle it at all. Instead select random list indices to process. To avoid processing a list element twice, I'd remove the processed elements from the list.
Now if the list is very large, removing a random entry itself would be the bottleneck. This can however be avoided easily by removing the last entry instead and moving it into the place the selected entry occupied before:
public String pullRandomElement(List<String> list, Random random) {
// select a random list index
int size = list.size();
int index = random.nextInt(size);
String result = list.get(index);
// move last entry to selected index
list.set(index, list.remove(size - 1));
return result;
}
Needless to say you should chose a list implementation where get(index) and remove(lastIndex) are fast O(1), such as ArrayList. You may also want to add edge case handling (such as list is empty).
You could use this: If you already have the list of items, generate a random according to its size and get nextInt.
ArrayList<String> list = new ArrayList<>();
int sizeOfCollection = list.size();
Random randomGenerator = new Random();
int randomId = randomGenerator.nextInt(sizeOfCollection);
Object x = list.get(randomId);
list.remove(randomId);
Since your code doesn't actually depend on the order of the list, it's enough to shuffle it once at the end of the processing.
void process() {
Random r = new Random();
for (int i = 0; i < 100000; i++) {
for (String str : list) {
if(r.nextDouble() > 0.9) {
list.add(str + str);
}
}
}
Collections.shuffle(list);
}
Though this would still throw a ConcurrentModificationException, like the original code.
Collections.shuffle() uses the modern variant of Fisher-Yates Algorithm:
From https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle
To shuffle an array a of n elements (indices 0..n-1):
for i from n − 1 downto 1 do
j ← random integer such that 0 ≤ j ≤ i
exchange a[j] and a[i]
Colections.shuffle convert the list to an array, then does the shuffle, just using random.nextInt() and then copies everything back. (see http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Collections.java#Collections.shuffle%28java.util.List%29)
You make this only faster by avoiding the overhead of copying the array and writing back:
Either write your own implementation of ArrayList where you can directly access the backing array, or access the field "elementData" of your ArrayList via reflection.
Now use the the same algorithm as Collections.shuffle on that array, using the correct size().
This speeds up, because it avoid the copying if the whole array, like Collection.shuffle() does:
The access via reflection needs a bit time, so this solution is faster only for higher number of elements.
I would not reccomend this solution unless you want to win the race, having the fastes shuffle, also by means of execution time.
And as always when comparing speeds, make sure you warm up the VM by running the algorithm to be measured 1000 times before starting the measuring.
According to the documentation, Collections.shuffle() runs in O(N) time.
This method runs in linear time. If the specified list does not implement the RandomAccess interface and is large, this implementation dumps the specified list into an array before shuffling it, and dumps the shuffled array back into the list. This avoids the quadratic behavior that would result from shuffling a "sequential access" list in place.
I recommend you use the public static void shuffle(List<?> list, Random rnd) overload, although the performance benefit will probably be negligible.
Improving the performance will be difficult unless you allow some bias, such as with partial shuffling (only a segment of the list gets re-shuffled each time) or under-shuffling. Under-shuffling means writing your own Fisher-Yates routine and skipping certain list indices during the reverse traversal; for example, you could skip all odd indices. However the end of your list would receive less shuffling than the front which is another form of bias.
If you had a fixed list size M, you might consider caching some large number N of different fixed index permutations (0 to M-1 in random order) in memory at application startup. Then you could just randomly select one of these pre-orderings whenever you iterate the collection and just iterate according to that particular previously defined permutation. If N were large (say 1000 or more), the overall bias would be small (and also relatively uniform) and would be very fast. However you noted your list slowly grows, so this approach wouldn't be viable.

picking without replacement in java

I often* find myself in need of a data structure which has the following properties:
can be initialized with an array of n objects in O(n).
one can obtain a random element in O(1), after this operation the picked
element is removed from the structure.
(without replacement)
one can undo p 'picking without replacement' operations in O(p)
one can remove a specific object (eg by id) from the structure in O(log(n))
one can obtain an array of the objects currently in the structure in
O(n).
the complexity (or even possibility) of other actions (eg insert) does not matter. Besides the complexity it should also be efficient for small numbers of n.
Can anyone give me guidelines on implementing such a structure? I currently implemented a structure having all above properties, except the picking of the element takes O(d) with d the number of past picks (since I explicitly check whether it is 'not yet picked'). I can figure out structures allowing picking in O(1), but these have higher complexities on at least one of the other operations.
BTW:
note that O(1) above implies that the complexity is independent from #earlier picked elements and independent from total #elements.
*in monte carlo algorithms (iterative picks of p random elements from a 'set' of n elements).
HashMap has complexity O(1) both for insertion and removal.
You specify a lot of operation, but all of them are nothing else then insertion, removal and traversing:
can be initialized with an array of n objects in O(n).
n * O(1) insertion. HashMap is fine
one can obtain a random element in
O(1), after this operation the picked
element is removed from the structure.
(without replacement)
This is the only op that require O(n).
one can undo p 'picking without
replacement' operations in O(p)
it's an insertion operation: O(1).
one can remove a specific object (eg
by id) from the structure in O(log(n))
O(1).
one can obtain an array of the objects
currently in the structure in O(n).
you can traverse an HashMap in O(n)
EDIT:
example of picking up a random element in O(n):
HashMap map ....
int randomIntFromZeroToYouHashMapSize = ...
Collection collection = map.values();
Object[] values = collection.toArray();
values[randomIntFromZeroToYouHashMapSize];
Ok, same answer as 0verbose with a simple fix to get the O(1) random lookup. Create an array which stores the same n objects. Now, in the HashMap, store the pairs . For example, say your Objects (strings for simplicity) are:
{"abc" , "def", "ghi"}
Create an
List<String> array = ArrayList<String>("abc","def","ghi")
Create a HashMap map with the following values:
for (int i = 0; i < array.size(); i++)
{
map.put(array[i],i);
}
O(1) random lookup is easily achieved by picking any index in the array. The only complication that arises is when you delete an object. For that, do:
Find object in map. Get its array index. Lets call this index i (map.get(i)) - O(1)
Swap array[i] with array[size of array - 1] (the last element in the array). Reduce the size of the array by 1 (since there is one less number now) - O(1)
Update the index of the new object in position i of the array in map (map.put(array[i], i)) - O(1)
I apologize for the mix of java and cpp notation, hope this helps
Here's my analysis of using Collections.shuffle() on an ArrayList:
✔ can be initialized with an array of n objects in O(n).
Yes, although the cost is amortized unless n is known in advance.
✔ one can obtain a random element in O(1), after this operation the picked element is removed from the structure, without replacement.
Yes, choose the last element in the shuffled array; replace the array with a subList() of the remaining elements.
✔ one can undo p 'picking without replacement' operations in O(p).
Yes, append the element to the end of this list via add().
❍ one can remove a specific object (eg by id) from the structure in O(log(n)).
No, it looks like O(n).
✔ one can obtain an array of the objects currently in the structure in O(n).
Yes, using toArray() looks reasonable.
How about an array (or ArrayList) that's divided into "picked" and "unpicked"? You keep track of where the boundary is, and to pick, you generate a random index below the boundary, then (since you don't care about order), swap the item at that index with the last unpicked item, and decrement the boundary. To unpick, you just increment the boundary.
Update: Forgot about O(log(n)) removal. Not that hard, though, just a little memory-expensive, if you keep a HashMap of IDs to indices.
If you poke around on line you'll find various IndexedHashSet implementations that all work on more or less this principle -- an array or ArrayList plus a HashMap.
(I'd love to see a more elegant solution, though, if one exists.)
Update 2: Hmm... or does the actual removal become O(n) again, if you have to either recopy the arrays or shift them around?

Categories

Resources