i thought linkedlists were supposed to be faster than an arraylist when adding elements? i just did a test of how long it takes to add, sort, and search for elements (arraylist vs linkedlist vs hashset). i was just using the java.util classes for arraylist and linkedlist...using both of the add(object) methods available to each class.
arraylist out performed linkedlist in filling the list...and in a linear search of the list.
is this right? did i do something wrong in the implementation maybe?
***************EDIT*****************
i just want to make sure i'm using these things right. here's what i'm doing:
public class LinkedListTest {
private List<String> Names;
public LinkedListTest(){
Names = new LinkedList<String>();
}
Then I just using linkedlist methods ie "Names.add(strings)". And when I tested arraylists, it's nearly identical:
public class ArrayListTest {
private List<String> Names;
public ArrayListTest(){
Names = new ArrayList<String>();
}
Am I doing it right?
Yes, that's right. LinkedList will have to do a memory allocation on each insertion, while ArrayList is permitted to do fewer of them, giving it amortized O(1) insertion. Memory allocation looks cheap, but may be actually be very expensive.
The linear search time is likely slower in LinkedList due to locality of reference: the ArrayList elements are closer together, so there are fewer cache misses.
When you plan to insert only at the end of a List, ArrayList is the implementation of choice.
Remember that:
there's a difference in "raw" performance for a given number of elements, and in how different structures scale;
different structures perform differently at different operations, and that's essentially part of what you need to take into account in choosing which structure to use.
So, for example, a linked list has more to do in adding to the end, because it has an additional object to allocate and initialise per item added, but whatever that "intrinsic" cost per item, both structures will have O(1) performance for adding to the end of the list, i.e. have an effectively "constant" time per addition whatever the size of the list, but that constant will be different between ArrayList vs LinkedList and likely to be greater for the latter.
On the other hand, a linked list has constant time for adding to the beginning of the list, whereas in the case of an ArrayList, the elements must be "shuftied" along, an operation that takes some time proportional to the number of elements. But, for a given list size, say, 100 elements, it may still be quicker to "shufty" 100 elements than it is to allocate and initialise a single placeholder object of the linked list (but by the time you get to, say, a thousand or a million objects or whatever the threshold is, it won't be).
So in your testing, you probably want to consider both the "raw" time of the operations at a given size and how these operations scale as the list size grows.
Why did you think LinkedList would be faster? In the general case, an insert into an array list is simply a case of updating the pointer for a single array cell (with O(1) random access). The LinkedList insert is also random access, but must allocate an "cell" object to hold the entry, and update a pair of pointers, as well as ultimately setting the reference to the object being inserted.
Of course, periodically the ArrayList's backing array may need to be resized (which won't be the case if it was chosen with a large enough initial capacity), but since the array grows exponentially the amortized cost will be low, and is bounded by O(lg n) complexity.
Simply put - inserts into array lists are much simpler and therefore much faster overall.
Linked list may be slower than array list in these cases for a few reasons. If you are inserting into the end of the list, it is likely that the array list has this space already allocated. The underlying array is usually increased in large chunks, because this is a very time-consuming process. So, in most cases, to add an element in the back requires only sticking in a reference, whereas the linked list needs the creation of a node. Adding in the front and the middle should give different performance in for both types of list.
Linear traversal of the list will always be faster in an array based list because it must only traverse the array normally. This requires one dereferencing operation per cell. In the linked list, the nodes of the list must also be dereferenced, taking double the amount of time.
When adding an element to the back of a LinkedList (in Java LinkedList is actually a doubly linked list) it is an O(1) operation as is adding an element to the front of it. Adding an element on the ith position is roughly an O(i) operation.
So, if you were adding to the front of the list, a LinkedList would be significantly faster.
ArrayList is faster in accessing random index data, but slower when inserting elements in the middle of the list, because using linked list you just have to change reference values. But in an array list you have to copy all elements after the inserted index, one index behind.
EDIT: Is not there a linkedlist implementation which keeps the last element in mind? Doing it this way would speed up inserting at the end using linked list.
Related
The StringBuffer/StringBuilder classes in Java are primarily used to modify String values without having to initialize a new String object everytime.
Is there a specific reason, it doesn't use a LinkedList instead of an Char Array as it's underlying data structure?
Inserting a char into a Array will always result in a O(n) time to copy all elements to the next index, where as that that be done in O(1) time in case of a LinkedList.
Random access
StringBuilder has random access operations such as charAt or substring, which would be extremely slow with a linked list.
Insertions
In fact, array lists aren't particularly slower than linked lists even when it comes to other operations like insertion. Typically StringBuilder won't be used to create strings of a million characters so it's unlikely that we need to resize the buffer too many times.
At the end
I have to correct you that insertion at the end always requires a O(n) copy of elements. The worst-case is indeed O(n) but the amortized complexity is O(1) because we don't just allocate one element at a time. When the array isn't big enough to make another insertion, most implementations double the size of the array.
In the middle
Insertions in the middle always require the copy of elements at the right of the insertion, so yes it is pretty slow but it's not a typical use-case for a StringBuilder where most insertions happen at the end. Also, linked lists have the same average complexity on insertions in the middle since they first have to reach the right node by iterating the list.
Data locality
Another advantage of arrays compared to linked list is data locality. Array lists are faster to iterate than linked lists because when the processor loads a piece of memory around an element of an array, it will also cache some of the neighbours of this element which will therefore be returned faster. On the other hand, all elements of a linked list may live in very distant memory locations, which is not cache-friendly.
Memory footprint
Because we double the size of the array of every resize, dynamic arrays can have a pretty big memory footprint (but at least we won't need to copy elements too often). Linked lists also have a fairly large memory footprint since they require one additional reference and a pointer for each element while elements are compactly stored in an array. On average, I'd say a typical array list will have a smaller memory footprint than a linked list but I may be wrong. This is particularly the case for primitive types - such as char - because linked lists require wrapper objects (at least in Java since there are no pointers) whereas we can use much more compact primitive arrays.
Last notes
Finally, I used StringBuilder in my answer instead of StringBuffer because this is the recommended implementation for most use-cases. StringBuffer is only preferrable when thread-safety is a hard requirement. Otherwise, StringBuilder will have better performance.
PS: Python's most prominent data structure is list and guess what... it's implemented with a dynamic array! Resizable arrays are very often a better choice than linked lists. The only case in which a linked list is notably more performant is when the application focuses on elements close to the head of the list and makes frequent insertions or deletions in this area.
I have had this question for a while but I have been unsatisfied with the answers because the distinctions appear to be arbitrary and more like conventional wisdom that is sort of blindly accepted rather than assessed critically.
In an ArrayList it is said that insertion cost (for a single element) is linear. If we are inserting at index p for 0 <= p < n where n is the size of the list, then the remaining n-p elements are shifted over first before the new element is copied into position p.
In a LinkedList, it is said that insertion cost (for a single element) is constant. For instance if we already have a node and we want to insert after it, we rearrange some pointers and it's done quickly. But getting this node in the first place, I don't see how it can be done other than a linear search first (assuming it isn't a trivial case like prepending at the start of the list or appending at the end).
And yet in the case of the LinkedList, we don't count that initial search time. To me this is confusing because it's sort of like saying "The ice cream is free... after you pay for it." It's like, well, of course it is... but that sort of skips the hard part of paying for it. Of course inserting in a LinkedList is going to be constant time if you already have the node you want, but getting that node in the first place may take some extra time! I could easily say that inserting in an ArrayList is constant time... after I move the remaining n-p elements.
So I don't understand why this distinction is made for one but not the other. You could argue that insertion is considered constant for LinkedLists because of the cases where you insert at the front or back where linear time operations are not required, whereas in an ArrayList, insertion requires copying of the suffix array after position p, but I could easily counter that by saying if we insert at the back of an ArrayList, it is amortized constant time and doesn't require extra copying in most cases unless we reach capacity.
In other words we separate the linear stuff from the constant stuff for LinkedList, but we don't separate them for the ArrayList, even though in both cases, the linear operations may not be invoked or not invoked.
So why do we consider them separate for LinkedList and not for ArrayList? Or are they only being defined here in the context where LinkedList is overwhelmingly used for head/tail appends and prepends as opposed to elements in the middle?
This is basically a limitation of the Java interface for List and LinkedList, rather than a fundamental limitation of linked lists. That is, in Java there is no convenient concept of "a pointer to a list node".
Every type of list has a few different concepts loosely associated with the idea of pointing to a particular item:
The idea of a "reference" to a specific item in a list
The integer position of an item in the list
The value of a item that may be in the list (possibly multiple times)
The most general concept is the first one, and is usually encapsulated in the idea of an iterator. As it happens, the simple way to implement an iterator for an array backed list is simply to wrap an integer which refers to the position of the item in a list. So for array lists only, the first and second ways of referring to items are pretty tightly bound.
For other list types, however, and even for most other container types (trees, hashes, etc) that is not the case. The generic reference to an item is usually something like a pointer to the wrapper structure around one item (e.g., HashMap.Entry or LinkedList.Entry). For these structures the idea of accessing the nth element isn't necessary natural or even possible (e.g., unordered collections like sets and many hash maps).
Perhaps unfortunately, Java made the idea of getting an item by its index a first-class operation. Many of the operations directly on List objects are implemented in terms of list indexes: remove(int index), add(int index, ...), get(int index), etc. So it's kind of natural to think of those operations as being the fundamental ones.
For LinkedList though it's more fundamental to use a pointer to a node to refer to an object. Rather than passing around a list index, you'd pass around the pointer. After inserting an element, you'd get a pointer to the element.
In C++ this concept is embodied in the concept of the iterator, which is the first class way to refer to items in collections, including lists. So does such a "pointer" exist in Java? It sure does - it's the Iterator object! Usually you think of an Iterator as being for iteration, but you can also think of it as pointing to a particular object.
So the key observation is: given an pointer (iterator) to an object, you can remove and add from linked lists in constant time, but from an array-like list this takes linear time in general. There is no inherent need to search for an object before deleting it: there are plenty of scenarios where you can maintain or take as input such a reference, or where you are processing the entire list, and here the constant time deletion of linked lists does change the algorithmic complexity.
Of course, if you need to do something like delete the first entry containing the value "foo" that implies both a search and a delete operation. Both array-based and linked lists taken O(n) for search, so they don't vary here - but you can meaningfully separate the search and delete operations.
So you could, in principle, pass around Iterator objects rather than list indexes or object values - at least if your use case supports it. However, at the top I said that "Java has no convenient notion of a pointer to a list node". Why?
Well because actually using Iterator is actually very inconvenient. First of all, it's tough to get an Iterator to an object in the first place: for example, and unlike C++, the add() methods don't return an Iterator - so to get a pointer to the item you just added, you need to go ahead and iterate over the list or use the listIterator(int index) call, which is inherently inefficient for linked lists. Many methods (e.g., subList()) support only a version that takes indexes, but not Iterators - even when such a method could be efficiently supported.
Add to that the restrictions around iterator invalidation when the list is modified, and they actually become pretty useless for referring to elements except in immutable lists.
So Java's support of pointers to list elements is pretty half-hearted an so it's tough to leverage the constant time operations that linked list offers, except in cases such as adding to the front of a list, or deleting items during iteration.
It's not limited to lists, either - the ConcurrentQueue is also a linked structure which supports constant time deletes, but you can't reliably use that ability from Java.
If you're using a LinkedList, chances are you're not going to use it for a random access insert. LinkedList offers constant time for push (insert at the beginning) or add (because it has a ref to the final element IIRC). You are correct in your suspicion that an insert into a random index (e.g. insert sorted) will take linear time - not constant.
ArrayList, by contrast, is worst case linear. Most of the time it simply does an arraycopy to shift the indices (which is a low-level shift that is constant time). Only when you need to resize the backing array will it take linear time.
Lets assume I call a third party API and get back a mutable N-many list of objects. This list could be as small as 10 objects or as large as a few thousand. I then always want to insert just one object at index 0 of that returned List. I know i can easily call add at index 0 but this is going to be O(n) as it shifts every object for the insert. My question is, would it be faster on average (processing wise) to create a new List with the item i plan on inserting at the beginning and then call addAll on that new List passing in the returned 3rd party N-many List?
It depends on the list implementation. If you truly have no visibility of what list implementation your third-party has given you, all you can do is empirical testing and benchmarking.
More likely, they're returning you one of the standard Java list types, and indeed you've tagged your question arraylist -- is that what you're given?
ArrayList.add(index,element) uses System.arrayCopy() to copy each shifted element from index n to n+1, then writes the new element to its slot. That's O(n), however it's likely to be very fast indeed, since it will use the highly optimised system memmove routine to move whole chunks of RAM at a time. (see Why are memcpy() and memmove() faster than pointer increments? ).
In addition, if your extra element nudges the size of the list past the size of the allocated backing array, Java will create a new array and arraycopy the whole lot into there.
Bear in mind that you're only copying object references, not whole objects, so for 1000 elements, you're copying (worst case on a 64 bit machine) 64 bits * 1000 == 8 kilobytes of RAM.
Still, for really huge lists, the time it takes might become significant. Inserting into a linked list is cheaper (should be O(1) at the start or end)
You can make it an O(1) operation on an arbitrary List implementation by writing/finding a List implementation that is just a wrapper around the existing list. For example:
public class HeadedList<T> extends AbstractList<T> {
private final List tail;
public HeadedList(T head, List tail) {
this.head = head;
this.tail = tail;
}
public T get(int index) {
return index == 0 ? head : tail.get(index - 1);
}
public int size() {
return tail.size() + 1;
}
}
(NB if you work in languages like Lisp/Clojure/etc you get very used to thinking of lists in this way)
But, only bother with this if benchmarking reveals that real performance problems are being caused by list building.
If the returned List impl is ArrayList, both options are the same: O(n).
If the returned impl is LinkedList, inserting at head is O(1).
There is an always O(1) option: Create a List wrapper class that is backed by the returned list but allows insertion at head by storing the inserted element internally. You would have to create a custom iterator to iterate over the inserted elemdnt then delegate to the list. Most methods would need similar customisation.
If it's only a 1000 or so elements I wouldn't bother, unless your application is complete and you've determined there is a measurable and severe enough performance problem at this operation.
If you were inserting multiple elements at head, then you would take a hit once to create a LinkedList, then each insertion would be O(1), but since you only have 1 to insert, don't bother.
KISS: Just insert the element into the returned list. I'm sure it will be faster enough, and most likely way faster than the library anyway.
As HashMap uses LinkedList when two different keys produces a same hashCode.But I was wondering what makes LinkedList a better candidate here over other implementation of List.Why not ArrayList because ArrayList uses Array internally and arrays have a faster iteration compared to a LinkedList.
Collisions in hash maps are an exception, rather than a rule. When your hash function is reasonably good, as it should be, there should be very few collisions.
If we used ArrayList for the buckets, with most lists being empty or having exactly one element, this would be a rather big waste of resources. With array lists allocating multiple members upfront, you would end up paying forward for multiple collisions that you may not have in the future.
Moreover, removing from array lists is cheap only when the last element gets deleted. When the first one gets deleted, you end up paying for the move of all elements.
Linked lists are free from these problems. Insertion is O(1), deletion is O(1), and they use exactly as many nodes as you insert. The memory overhead of the next/prior links is not too big a price to pay for this convenience.
The problem with an arrayList is that you can't fast remove an element: you have to move all the elements after the one you remove.
With a linkedList, removing an element is merely changing a reference from one node to the new next one, skipping the removed one.
The difference is huge. When you want to have a list and be able to fast remove elements, don't use an arraylist, the usual choice is the linked list.
Why not ArrayList because ArrayList uses Array internally and arrays have a faster iteration compared to a LinkedList.
And ArrayList is much slower to modify. So they made a judgement call and went with LinkedList.
I am wondering what is the time complexity [in big O(n) notations] of ArrayList to Array conversion:
ArrayList assetTradingList = new ArrayList();
assetTradingList.add("Stocks trading");
assetTradingList.add("futures and option trading");
assetTradingList.add("electronic trading");
assetTradingList.add("forex trading");
assetTradingList.add("gold trading");
assetTradingList.add("fixed income bond trading");
String [] assetTradingArray = new String[assetTradingList.size()];
assetTradingArray.toArray(assetTradingArray);
similarly, what is the time complexity for arrays to list in the following ways:
method 1 using Arrays.asList:
String[] asset = {"equity", "stocks", "gold", "foreign exchange","fixed
income", "futures", "options"};
List assetList = Arrays.asList(asset);
method 2 using collections.addAll:
List assetList = new ArrayList();
String[] asset = {"equity", "stocks", "gold", "foreign exchange", "fixed
income", "futures", "options"};
Collections.addAll(assetList, asset);
method 3 addAll:
ArrayList newAssetList = new ArrayList();
newAssetList.addAll(Arrays.asList(asset));
The reason I am interested in the overhead of copying back and forth is because in typical interviews, questions come such as given an array of pre-order traversal elements, convert to binary search tree and so on, involving arrays. With List offering a whole bunch of operations such as remove etc, it would make it simple to code using List than Array.
In which case, I would like to defend me for using list instead of arrays saying "I would first convert the Array to List because the overhead of this operation is not much (hopefully)".
Any better methods recommended for copying the elements back and forth from array to list that would be faster would be good know too.
Thanks
It would seem that Arrays.asList(T[]); is the fastest withO(1)
Because the method returns an unmodifiable List, there is no reason to copy the references over to a new data structure. The method simply uses the given array as a backing array for the unmodifiable List implementation that it returns.
The other methods seem like they copy each element, one by one to an underlying data structure. ArrayList#toArray(..) uses System.arraycopy(..) deep down (O(n) but faster because it's done natively). Collections.addAll(..) loops through the array elements (O(n)).
Careful when using ArrayList. The backing array doubles in size when its capacity is reached, ie. when it's full. This takes O(n) time. Adding to an ArrayList might not be the best idea unless you know how many elements you are adding from the beginning and create it with that size.
Since the backing data structure of ArrayList is an array, and copying of an array elements is a O(n), it is O(n).
The only overhead I see is pollution of heap with those intermediate objects. Most of the time developers (especially, beginners) don't care about that and treat Java GC as a magic wand that cleans everything after them. My personal opinion is, if you can avoid unwanted transformation of array to list and vice versa, do that.
If you know beforehand the foreseeable (e.g. defined) size of a list, preallocate its size with ArrayList(int size) constructor to avoid internal array copying that takes place inside ArrayList when capacity is exhausted. Depending on a use case, consider other implementations, e.g. LinkedList, if you're only interested in consequent addition to the list, and iterative reading.
An ArrayList is fundamentally just a wrapper around an Object[] array. It has a lot of helpful methods for doing things like finding items, adding and removing items, and so on, but from a time complexity perspective these are no better than doing the operations yourself on a plain array.
If your hope is that an ArrayList is fundamentally more efficient than a manually managed array, it's not. Sorry!
Converting an array to an ArrayList takes O(n) time. Every element must be copied.
Inserting or removing an element takes O(m) amortized time, where m is the number of elements following the insertion/removal index. These elements have to be moved to new indices whether you use an array or ArrayList.
"Amortized" means average -- sometimes the backing array will need to be grown or shrunk, which takes additional time on the order of O(n). This doesn't happen every time, so on the whole the additional time averages out to an O(1) additional cost.
Accessing an element at an arbitrary index takes O(1) time. Both provide constant time random access.