Insertion in the middle of ArrayList vs LinkedList [duplicate] - java

This question already has answers here:
When to use LinkedList over ArrayList in Java?
(33 answers)
Closed 9 years ago.
Talking in Java's context. If I want to insert in the middle of either an ArrayList or a linkedList, I've been told that Arraylist will perform terribly.
I understand that it is because, we need to shift all the elements and then do the insertion. This should be of the order n/2 i.e. O(n).
But is not it the same for linkedList. For linked List, we need to traverse till the time we find the middle, and then do the pointer manipulation. In this case too, it will take O(n) time. Would not it?
Thanks

The reason here is that there's no actual shifting of elements in the linked list. A linked list is built up from nodes, each of which holds an element and a pointer to the next node. To insert an element into a list requires only a few things:
create a new node to hold the element;
set the next pointer of the previous node to the new node;
set the next pointer of the new node to the next element in the list.
If you've ever made a chain of paper clips, you can think of each paper clip as being the beginning of the chain of it and all the paper clips that come after it. To stick a new paper clip into the chain, you only need to disconnect the paper clips at the spot where the new one will go, and insert the new one. A LinkedList is like a paper clip chain.
An ArrayList is kind of like a pillbox or a mancala board where each compartment can hold only a single item. If you want to insert a new one in the middle (and keep all the elements in the same order), you're going to have to shift everything after that spot.
The insertion after a given node in a linked list is constant time, as long as you already have a reference to that node (with a ListIterator in Java), and getting to that position will typically require time linear in the position of the node. That is, to get to the _n_th node takes n steps. In an array list (or array, or any structure that's based on contiguous memory, really) the address of the _n_th element in the list is just (address of 1st element)+n×(size of element), a trivial bit of arithmetic, and our computing devices support quick access to arbitrary memory addresses.

I think, when analysing the complexity, you need to take into account the metric you are using. In the ArrayList, your metric is shuffling, which is just assignment. But this is quite a complex operation.
On the other hand, you're using a LinkedList, and you're simply looking going to the reference. In fact, you only perform 1 insertion. So while the algorithmic complexity will wind up similar, the actual processes that are being executed at O(n) time are different. In the case of an ArrayList, it is performing a lot of memory manipulation. In the case of a LinkedList, it's only reading.
For those saying he doesn't understand LinkedLists
A LinkedList only has a pointed at the start, and a pointer at the end. It does not automatically know the Node behind the node you want to delete (unless it's a doubly linked list) so you need to traverse through the list, from the start by creating a temp pointer, until you come to the node before the one you want to delete, and I believe it's this that OP is discussing.

Related

ArrayList & LinkedList Big-O Complexities

I'm having trouble determining the Big-O notation of the following operations for both LinkedLists and ArrayLists:
traversal to middle of list
modification at middle of list
For ArrayLists, I think that traversal is an O(n) operation while modification is an O(1) operation, given the index. For LinkedLists, I think that both traversal and modification should be O(n).
I'm not sure if I'm right on either one of these structures, since the definition of "traversal" is a bit unclear to me. Does "traversal" mean "iteration"?
Thanks for your help in advance!
In general, traversal means iterating over the list. But if all you need is to get access to the middle element, you don't have to traverse.
Since an ArrayList is backed up by an array and you know the size of the list (array) and an array allows RandomAccess by an index, getting the middle element is O(1).
traversing to the middle of the list for an arraylist is O(1) - they're indexed. Given an arraylist of 5 million items, fetching the 2.5millionth item takes about as long as getting the 5th item from an arraylist of 10 items.
For linkedlist, it's O(n) - you aren't fetching the 2.5millionth item except by starting from the front and iterating one-by-one through each and every item in the list.
Insertion for arraylists is O(n) - to insert, you need to take all elements that are after it, and shift them up by one. Inserting an element in the middle of a 5-million large arraylist is going to take a lot longer than in the middle of a 10-item large arraylist; one requires moving 2.5 million items up by 1, the other only 5.
Insertion in the middle for linkedlists is O(1), you just tell the 'before' item that the 'next' is now the new one, and what used to be the 'before' item's 'next' now gets configured so that its 'prev' item is the newly added item; then the newly added item's 'prev' and 'next' point at those 2 nodes.
Of course, traversing to the right node in your linkedlist to this.. is O(n).

Why do we say linked list inserts are constant time?

I understand that linked list insertions are constant time due to simple rearrangement of pointers but doesn't this require knowing the element from which you're doing the insert?
And getting access to that element requires a linear search. So why don't we say that inserts are still bound by a linear search bottleneck first?
Edit: I am not talking about head or tail appends, but rather insertions anywhere in between.
Yes, it requires already having a node where you're going to insert next to.
So why don't we say that inserts are still bound by a linear search bottleneck first?
Because that isn't necessarily the case, if you can arrange things such that you actually do know the insertion point (not just the index, but the node).
Obviously you can "insert" at the front or end, that seems like a bit of cheat perhaps, it stretches the meaning of the word "insert" a bit. But consider an other case: while you're appending to the list, at some point you remember a node. Just any node of your choosing, using any criterium to select it that you want. Then you could easily insert after or before that node later.
That sounds like a very "constructed" situation, because it is. For a more practical case that is a lot like this (but not exactly), you could look at the Dancing Links algorithm.
Why do we say linked list inserts are constant time?
Because the insert operation is constant time.
Note that locating the position of the insert is not considered part of the insert operation itself. That would be a different operation, which may or may not be constant time, e.g. if including search time, you get:
Insert at head: Constant
Insert at tail: Constant
Insert before current element while iterating1: Constant
Insert at index position: Linear
1) Assuming you're iterating anyway.
By contrast, ArrayList insert operation is linear time. If including search time, you get:
Insert at head: Linear
Insert at tail: Constant (amortized)
Insert before current element while iterating1: Linear
Insert at index position: Linear
The following two operations are different:
Operation A: Insert anywhere in the linked list
Operation B: Insert at a specific position in the linked list
Operation A can be achieved in O(1). The new element can inserted at head (or tail if maintained and desired).
Operation B involves finding followed by inserting. The finding part is O(n). The inserting is as above, i.e. O(1). If, however, the result of the finding is provided as input, for example if there are APIs like
Node * Find(Node * head, int find_property);
Node * InsertAfter(Node * head, Node * existing_node, Node * new_node);
then the insert part of the operation is O(1).

Data structure in Java that supports quick search and remove in array with duplicates

More specifically, suppose I have an array with duplicates:
{3,2,3,4,2,2,1,4}
I want to have a data structure that supports search and remove the first occurrence of some value faster than O(n), say if the value is 4, then it becomes:
{3,2,3,2,2,1,4}
I also need to iterate the list from head according to the same order. Other operations like get(index) or insert are not needed.
You can use O(n) time to record the original data(say it's an int[]) in your data structure, I just need the later search and remove faster than O(n).
"Search and remove" is considered as ONE operation as shown above.
If I have to make it myself, I would use a LinkedList to store the data, and HashMap to map every key to a list of all occurrence of nodes together with their previous and next ones.
Is it a right approach? Are there any better choices already there in Java?
The data structure you describe, essentially a hybrid linked list and map, I think is the most efficient way of handling your stated problem. You'll have to keep track of the nodes yourself, since Java's LinkedList doesn't provide access to the actual nodes. The AbstractSequentialList may be helpful here.
The index structure you'll need is a map from an element value to the appearances of that element in the list. I recommend a hash table from hashCode % modulus to a linked list of (value, list of main-list nodes).
Note that this approach is still O(n) in the worst case, when you have universal hash collisions; this applies whether you use open or closed hashing. In the average case it should be something closer to O(ln(n)), but I'm not prepared to prove that.
Consider also whether the overhead of keeping track of all of this is really worth the gains. Unless you've actually profiled running code and determined that a LinkedList is causing problems because remove is O(n), stick with that until you do.
Since your requirement is that the first occurrence of the element should be removed and the remaining occurrences retained, there would be no way to do it faster than O(n) as you would definitely have to move through to the end of the list to find out if there is another occurrence. There is no standard api from Oracle in the java package that does this.

How can we reduce search time in linked list?

I read this question on career cup but didn't find any good answer other than 'SkipList'. The description of SkipList that I found on wikipedia was interesting, however, I didn't understand some terms like 'geometric/binomial distrubution'... I read what it is and goes deep into probabilistic theory. I simply wanted to implement a way to make some searching quicker. So here's what I did:
1. Created indexes.
- I wrote a function to create say 1000 nodes. Then, I created an array of type linked list and looped through the 1000 nodes and picked every 23rd element (random number that came in my mind) and added to the array which I call 'index'.
SLL index = new SLL[50]
Now the function to to create the index:
private static void createIndex(SLL[] index, SLL head){
int count=0;
SLL temp = head;
while(temp!=null)
{
count++;
temp = temp.next;
if((count==23){
index[i] = temp;
i++;
count=0;
}
}
}
Now finally the 'find' function. In that function, I first take the input element say 769 for example. I go through the 'index' array and find index[i]>769. Thus, now I pass head = index[i-1] and tail = index[i] to the 'find' function. It will then search between a short range of 23 elements for 769. Thus, I calculated that it takes a total of 43 jumps (including the array jumps and the node=node.next jumps) to find the element I wanted which otherwise would have taken 769 jumps.
Please Note: I consider the code to create index array NOT a part of searching, thus I do not add its time complexitiy(which is terrible) with the 'find' function's time complexity. I assume that this creation of index should be done as a separate function after a list has been created, OR, do it timely. Just like it takes time for a webpage to show up on google searches.
Also, this question was asked in a Microsoft interview and I wonder if the solution I provided would be any good or would I look like a fool for providing such kind of a solution. The solution has been written in Java.
Waiting for your feedback.
It is difficult to make out what problem it is that you are trying to solve here, or how your solution is supposed to work. (Hint: complete working code would help with both!)
However there are a couple of general things we can say:
You can't search a list data structure (e.g. find i in the list) in better that O(N) unless some kind of ordering has been placed on it. For example, sorting the elements.
If the elements of the list are sorted and your list is indexable (i.e. getting the element at position i is O(1)), then you can use binary search and find an element in O(logN).
You can't get the element at position i of a linked list in better that O(N).
If you add secondary data (indexes, whatever), you can potentially get better performance for certain operations ... at the expense of more storage space, and making certain other operations more expensive. However, you no longer have a list / linked list. The entire data structure is "something else".

list or container O(1)-ish insertion/deletion performance, with array semantics

I'm looking for a collection that offers list semantics, but also allows array semantics. Say I have a list with the following items:
apple orange carrot pear
then my container array would:
container[0] == apple
container[1] == orangle
container[2] == carrot
Then say I delete the orange element:
container[0] == apple
container[1] == carrot
I want to collapse gaps in the array without having to do an explicit resizing, Ie if I delete container[0], then the container collapses, so that container[1] is now mapped as container[0], and container[2] as container[1], etc. I still need to access the list with array semantics, and null values aren't allow (in my particular use case).
EDIT:
To answer some questions - I know O(1) is impossible, but I don't want a container with array semantics approaching O(log N). Sort of defeats the purpose, I could just iterate the list.
I originally had some verbiage here on sort order, I'm not sure what I was thinking at the time (Friday beer-o-clock most likely). One of the use-cases is Qt list that contains images - deleting an image from the list should collapse the list, not necessary take the last item from the list and throw it in it's place. In this case, yet, I do want to preserve list semantics.
The key differences I see as separating list and array:
Array - constant-time access
List - arbitrary insertion
I'm also not overly concerned if rebalancing invalidates iterators.
You could do an ArrayList/Vector (Java/C++) and when you delete, instead swap the last element with the deleted element first. So if you have A B C D E, and you delete C, you'll end up with A B E D. Note that references to E will have to look at 2 instead of 4 now (assuming 0 indexed) but you said sort order isn't a problem.
I don't know if it handles this automatically (optimized for removing from the end easily) but if it's not you could easily write your own array-wrapper class.
O(1) might be too much to ask for.
Is O(logn) insert/delete/access time ok? Then you can have a balanced red-black tree with order statistics: http://www.catonmat.net/blog/mit-introduction-to-algorithms-part-seven/
It allows you to insert/delete/access elements by position.
As Micheal was kind enough to point out, Java Treemap supports it: http://java.sun.com/j2se/1.5.0/docs/api/java/util/TreeMap.html
Also, not sure why you think O(logN) will be as bad as iterating the list!
From my comments to you on some other answer:
For 1 million items, using balanced
red-black trees, the worst case is
2log(n+1) i.e ~40. You need to do no
more than 40 compares to find your
element and that is the absolute worst
case. Red-black trees also cater to
the hole/gap disappearing. This is
miles ahead of iterating the list (~
1/2 million on average!).
With AVL trees instead of red-black
trees, the worst case guarantee is
even better: 1.44 log(n+1), which is
~29 for a million items.
You should use a HashMap, the you will have O(1)- Expected insertion time, just do a mapping from integers to whatever.
If the order isn't important, then a vector will be fine. Access is O(1), as is insertion using push_back, and removal like this:
swap(container[victim], container.back());
container.pop_back();
EDIT: just noticed the question is tagged C++ and Java. This answer is for C++ only.
I'm not aware of any data structure that provides O(1) random access, insertion, and deletion, so I suspect you'll have to accept some tradeoffs.
LinkedList in Java provides O(1) insertion/deletion from the head or tail of the list is O(1), but random access is O(n).
ArrayList provides O(1) random access, but insertion/deletion is only O(1) at the tail of the list. If you insert/delete from the middle of the list, it has to move around the remaining elements in the list. On the bright side, it uses System.arraycopy to move elements, and it's my understanding that this is essentially O(1) on modern architectures because it literally just copies blocks of memory around instead of processing each element individually. I say essentially because there is still work to find enough contiguous blocks of free space, etc. and I'm not sure what the big-O might be on that.
Since you seem to want to insert at arbitrary positions in (near) constant time, I think using a std::deque is your best bet in C++. Unlike the std::vector, a deque (double-ended queue) is implemented as a list of memory pages, i.e. a chunked vector. This makes insertion and deletion at arbitrary positions a constant-time operation (depending only on the page size used in the deque). The data structure also provides random access (“array access”) in near-constant time – it does have to search for the correct page but this is a very fast operation in practice.
Java’s standard container library doesn’t offer anything similar but the implementation is straightforward.
Does the data structure described at http://research.swtch.com/2008/03/using-uninitialized-memory-for-fun-and.html do anything like what you want?
What about Concurent SkipList Map?
It do O(Log N) ?

Categories

Resources