Why do we say linked list inserts are constant time? - java

I understand that linked list insertions are constant time due to simple rearrangement of pointers but doesn't this require knowing the element from which you're doing the insert?
And getting access to that element requires a linear search. So why don't we say that inserts are still bound by a linear search bottleneck first?
Edit: I am not talking about head or tail appends, but rather insertions anywhere in between.

Yes, it requires already having a node where you're going to insert next to.
So why don't we say that inserts are still bound by a linear search bottleneck first?
Because that isn't necessarily the case, if you can arrange things such that you actually do know the insertion point (not just the index, but the node).
Obviously you can "insert" at the front or end, that seems like a bit of cheat perhaps, it stretches the meaning of the word "insert" a bit. But consider an other case: while you're appending to the list, at some point you remember a node. Just any node of your choosing, using any criterium to select it that you want. Then you could easily insert after or before that node later.
That sounds like a very "constructed" situation, because it is. For a more practical case that is a lot like this (but not exactly), you could look at the Dancing Links algorithm.

Why do we say linked list inserts are constant time?
Because the insert operation is constant time.
Note that locating the position of the insert is not considered part of the insert operation itself. That would be a different operation, which may or may not be constant time, e.g. if including search time, you get:
Insert at head: Constant
Insert at tail: Constant
Insert before current element while iterating1: Constant
Insert at index position: Linear
1) Assuming you're iterating anyway.
By contrast, ArrayList insert operation is linear time. If including search time, you get:
Insert at head: Linear
Insert at tail: Constant (amortized)
Insert before current element while iterating1: Linear
Insert at index position: Linear

The following two operations are different:
Operation A: Insert anywhere in the linked list
Operation B: Insert at a specific position in the linked list
Operation A can be achieved in O(1). The new element can inserted at head (or tail if maintained and desired).
Operation B involves finding followed by inserting. The finding part is O(n). The inserting is as above, i.e. O(1). If, however, the result of the finding is provided as input, for example if there are APIs like
Node * Find(Node * head, int find_property);
Node * InsertAfter(Node * head, Node * existing_node, Node * new_node);
then the insert part of the operation is O(1).

Related

Java interview question: get entry by two fields in O(log(n)) time

Hi had an interview task, the idea is to store elements with fields: id, name, updateTime;
There should be methods add(Element), getElement(id), getLastUpdatedElements()
Requirements:
code should be on Java
Should be thread safe
Upper bound of computational complexiy for all these methods should be O(log(n))
Notes
Update time of any element can be changed in runtime
getLastUpdatedElements - returns updated last minute elements
My thoughts
I can not use CopyOnWriteArrayList because it will take O(N) to find last updated elements if the key is id, what breaks the requirement.
To fit O(log(N)) complexity with getLastUpdatedElements() I can use ConcurrentSkipListSet with comparator by updateTime but in that case it will take O(N) to get element by ID. (Please note that in this case add(Element) is O(log(N)) since we know updateTime for newly created elements)
I can use two trees, first one with comparator by id, second - with comparator by updateTime, but all access methods I should make synchronize what makes my programm single threaded
I think I'm close, just need to find how to get element with O(log(N)) but my thoughts are running out.
I hope I understood you correctly.
If you need to store the elements and have an "add" and "get" time as low as (log(N)), that sounds like classic hash map (which uses linked list hash and binary tree if search time reaches a certain threshold - since java 8 I believe).
so in the worst case it's log(N).
for the "get last updated" function: you can store each updated element in a stack (not really a stack, just a list you keep adding into) and when the function is performed. just perform a binary search on the list. when you reach the first item that has been updated in the last minute - just return the index to that item.
that way you only perform binary search (log(N)).
oh and of course just have a lock for those two data structures.
if you really want to dig into it performance-wise, you can implement two locks: one for inserting/updating entries, and one just for reading them.
similar to the "readers-writers problem" like so: https://www.tutorialspoint.com/readers-writers-problem

ArrayList: insertion vs insertion at specified element

Consider an Arraylist. Internally it is not full, and the number of elements inserted so far is known. The elements are not sorted.
Choose the operations listed below that are fast regardless of the number of elements contained in the ArrayList. (In other words, takes only several instructions to implement).
Insertion
Insertion at a given index
Getting the data from a specified index
Finding the maximum value in an array of integers (not necessarily sorted)
Deletion at the given index
Replacing an element at a specified index
Searching for a specific element
I chose Insertion at specified index, Getting the data from a specified index, and replacing an element but answer key says Insertion. As I usually understand it, in an ArrayList, the insert operation requires all of the elements to shift left. If we did this at the beginning of the list, we would have $O(n)$ time complexity. However, if we did it at the end, it would be $O(1)$.
My question comes down to: (1) what, if any, difference is there between insertion and insertion at specified index and (2) given this particular time complexity for insertion why is it considered "fast"
First take a look at these two methods defined in java.util.ArrayList
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
public void add(int index, E element) {
rangeCheckForAdd(index);
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
elementData[index] = element;
size++;
}
Now if you see the first method (just adding element), it just ensures whether there's sufficient capacity and appends element to the last of the list.
So if there's sufficient capacity, then this operation would require O(1) time complexity, otherwise it would require shifting of all elements and ultimately time complexity increases to O(n).
In the second method, when you specify index, all the elements after that index would be shifted and this would definitely take more time then former.
For the first question the answer is this:
Insertion at a specified index i takes O(n), since all the elements following i will have to be shifted to right with one position.
On the other hand, simple insertion as implemented in Java (ArrayList.add()) will only take O(1) because the element is appended to the end of the array, so no shift is required.
For the second question, it is obvious why simple insertion is fast: no extra operation is needed, so time is constant.
ArrayList internally is nothing but an Array itself which uses Array.copyOf to create a new Array with increased size,upon add,but with original content intact.
So about insertion, whether you do a simple add (which will add the data at the end of the array) or on ,say, first(0th) index , it will still be faster then most data structures , keeping in mind the simplicity of the Data Structures.
The only difference is that simple add require no traversal but adding at index require shifting of elements to the left, similarly for delete. That uses System.arrayCopy to copy one array to another with alteration in index and the data.
So ,yeah simple insertion is faster then indexed insertion.
(1) what, if any, difference is there between insertion and insertion at specified index and
An ArrayList stores it's elements consecutively. Adding to the end of the ArrayList does not require the ArrayList to be altered in any way except for adding the new element to the end of itself. Thus, this operation is O(1), taking constant time which is favorable when wanting to perform an action repetitively in a data structure.
Adding an element to an index, however, requires the ArrayList to make room for the element in some way. How is that done? Every element following the inserted element will have to be moved one step to make room for the new insertion. Your index is anything in between the first element and and the nth element (inclusively). This operation thus is O(1) at best and O(n) at worst where n is the size of the array. For large lists, O(n) takes significantly longer time than O(1).
(2) given this particular time complexity for insertion why is it considered "fast"
It is considered fast because it is O(1), or constant time. If the time complexity is truly only one operation, it is as fast as it can possibly be, other small constants are also regarded fast and are often equally notated by O(1), where the "1" does not mean one single operation strictly, but that the amount of operations does not depend on the size of something else, in your example it would be the size of the ArrayList. However, constant time complexity can involve large constants as well, but in general is regarded as the fastest as possible time complexity. To put this into context, an O(1) operations takes roughly 1 * k operations in an ArrayList with 1000 elements, while a O(n) operation takes roughly 1000 * k operations, where k is some constant.
Big-O notation is used as a metric to measure how many operations an action or a whole programs will execute when they are run.
For more information about big O-notation:
What is a plain English explanation of "Big O" notation?

Data structure in Java that supports quick search and remove in array with duplicates

More specifically, suppose I have an array with duplicates:
{3,2,3,4,2,2,1,4}
I want to have a data structure that supports search and remove the first occurrence of some value faster than O(n), say if the value is 4, then it becomes:
{3,2,3,2,2,1,4}
I also need to iterate the list from head according to the same order. Other operations like get(index) or insert are not needed.
You can use O(n) time to record the original data(say it's an int[]) in your data structure, I just need the later search and remove faster than O(n).
"Search and remove" is considered as ONE operation as shown above.
If I have to make it myself, I would use a LinkedList to store the data, and HashMap to map every key to a list of all occurrence of nodes together with their previous and next ones.
Is it a right approach? Are there any better choices already there in Java?
The data structure you describe, essentially a hybrid linked list and map, I think is the most efficient way of handling your stated problem. You'll have to keep track of the nodes yourself, since Java's LinkedList doesn't provide access to the actual nodes. The AbstractSequentialList may be helpful here.
The index structure you'll need is a map from an element value to the appearances of that element in the list. I recommend a hash table from hashCode % modulus to a linked list of (value, list of main-list nodes).
Note that this approach is still O(n) in the worst case, when you have universal hash collisions; this applies whether you use open or closed hashing. In the average case it should be something closer to O(ln(n)), but I'm not prepared to prove that.
Consider also whether the overhead of keeping track of all of this is really worth the gains. Unless you've actually profiled running code and determined that a LinkedList is causing problems because remove is O(n), stick with that until you do.
Since your requirement is that the first occurrence of the element should be removed and the remaining occurrences retained, there would be no way to do it faster than O(n) as you would definitely have to move through to the end of the list to find out if there is another occurrence. There is no standard api from Oracle in the java package that does this.

Queue data structure supporting fast k-th largest element finding

I'm faced with a problem which requires a Queue data structure supporting fast k-th largest element finding.
The requirements of this data structure are as follows:
The elements in the queue are not necessarily integers, but they must be comparable to each other, i.e we can tell which one is greater when we compare two elements(they can be equal as well).
The data structure must support enqueue(adds the element at the tail) and dequeue(removes the element at the head).
It can quickly find the k-th largest element in the queue, pls note k is not a constant.
You can assume that operations enqueue , dequeue and k-th largest element finding all occur with the same frequency.
My idea is to use a modified balanced binary search tree. The tree is the same as ordinary balanced binary search tree except that every nodei is augmented with another field ni, ni denotes the number of nodes contained in the subtree with root nodei. The aforementioned operations are supported as follows:
For simplicity assume that all elements are distinct.
Enqueue(x): x is first inserted into the tree, suppose the corresponding node is nodet, we append pair(x,pointer to nodet) to the queue.
Dequeue: suppose (e1, node1) is the element at the head, node1 is the pointer into the tree corresponding to e1. We delete node1 from the tree and remove (e1, node1) from the queue.
K-th largest element finding: suppose root node is noderoot, its two children are nodeleft and noderight(suppose they all exist), we compare K with nroot , three cases may happen:
if K< nleft we find the K-th largest element in the left subtree of nroot;
if K>nroot-nright we find the (K-nroot+nright)-th largest element in the right subtree of nroot;
otherwise nroot is the node we want.
The time complexity of all the three operations are O(logN) , where N is the number of elements currently in the queue.
How can I speed up the operations mentioned above? With what data structures and how?
Note - you cannot achieve better then O(logn) for all, at best you need to "chose" which op you care for the most. (Otherwise, you could sort in O(n) by feeding the array to the DS, and querying 1st, 2nd, 3rd, ... nth elements)
Using a skip list instead of a Balanced BST as the sorted structure
can reduce dequeue complexity to O(1) average case. It does
not affect complexity of any other op.
To remove from a skip list - all you need to do is to get to the element using the pointer from the head of the queue, and follow the links up and remove each. The expected number of nodes needed to be deleted is 1 + 1/2 + 1/4 + ... = 2.
find Kth can be achieved in O(logK) by starting from the leftest node (and not the root) and making your way up until you find you have "more sons then needed", and then treat the just found node as the root just like the algorithm in the question. Though it is better in asymptotic complexity - the constant factor is double.
I found an interesting paper:
Sliding-Window Top-k Queries on Uncertain Streams published in VLDB 2008 and cited by 71.
https://www.cse.ust.hk/~yike/wtopk.pdf
VLDB is the best conference in database research area, and the number of citations proves the data structure actually works.
The paper looks pretty difficult, but if you really need improve your data structure, I suggest you to read this paper or papers in the reference page of this paper.
You can also use a finger tree.
For example, a priority queue can be implemented by labeling the internal nodes by the minimum priority of its children in the tree, or an indexed list/array can be implemented with a labeling of nodes by the count of the leaves in their children. Finger trees can provide amortized O(1) cons, reversing, cdr, O(log n) append and split; and can be adapted to be indexed or ordered sequences.
Also note that being a purely functional structure makes this a good choice for concurrent usage.

Which is the appropriate data structure?

I need a Java data structure that has:
fast (O(1)) insertion
fast removal
fast (O(1)) max() function
What's the best data structure to use?
HashMap would almost work, but using java.util.Collections.max() is at least O(n) in the size of the map. TreeMap's insertion and removal are too slow.
Any thoughts?
O(1) insertion and O(1) max() are mutually exclusive together with the fast removal point.
A O(1) insertion collection won't have O(1) max as the collection is unsorted. A O(1) max collection has to be sorted, thus the insert is O(n). You'll have to bite the bullet and choose between the two. In both cases however, the removal should be equally fast.
If you can live with slow removal, you could have a variable saving the current highest element, compare on insert with that variable, max and insert should be O(1) then. Removal will be O(n) then though, as you have to find a new highest element in the cases where the removed element was the highest.
If you can have O(log n) insertion and removal, you can have O(1) max value with a TreeSet or a PriorityQueue. O(log n) is pretty good for most applications.
If you accept that O(log n) is still "fast" even though it isn't "fast (O(1))", then some kinds of heap-based priority queue will do it. See the comparison table for different heaps you might use.
Note that Java's library PriorityQueue isn't very exciting, it only guarantees O(n) remove(Object).
For heap-based queues "remove" can be implemented as "decreaseKey" followed by "removeMin", provided that you reserve a "negative infinity" value for the purpose. And since it's the max you want, invert all mentions of "min" to "max" and "decrease" to "increase" when reading the article...
you cannot have O(1) removal+insertion+max
proof:
assume you could, let's call this data base D
given an array A:
1. insert all elements in A to D.
2. create empty linked list L
3. while D is not empty:
3.1. x<-D.max(); D.delete(x); --all is O(1) - assumption
3.2 L.insert_first(x) -- O(1)
4. return L
in here we created a sorting algorithm which is O(n), but it is proven to be impossible! sorting is known as omega(nlog(n)). contradiction! thus, D cannot exist.
I'm very skeptical that TreeMap's log(n) insertion and deletion are too slow--log(n) time is practically constant with respect to most real applications. Even with a 1,000,000,000 elements in your tree, if it's balanced well you will only perform log(2, 1000000000) = ~30 comparisons per insertion or removal, which is comparable to what any other hash function would take.
Such a data structure would be awesome and, as far as I know, doesn't exist. Others pointed this.
But you can go beyond, if you don't care making all of this a bit more complex.
If you can "waste" some memory and some programming efforts, you can use, at the same time, different data structures, combining the pro's of each one.
For example I needed a sorted data structure but wanted to have O(1) lookups ("is the element X in the collection?"), not O(log n). I combined a TreeMap with an HashMap (which is not really O(1) but it is almost when it's not too full and the hashing function is good) and I got really good results.
For your specific case, I would go for a dynamic combination between an HashMap and a custom helper data structure. I have in my mind something very complex (hash map + variable length priority queue), but I'll go for a simple example. Just keep all the stuff in the HashMap, and then use a special field (currentMax) that only contains the max element in the map. When you insert() in your combined data structure, if the element you're going to insert is > than the current max, then you do currentMax <- elementGoingToInsert (and you insert it in the HashMap).
When you remove an element from your combined data structure, you check if it is equal to the currentMax and if it is, you remove it from the map (that's normal) and you have to find the new max (in O(n)). So you do currentMax <- findMaxInCollection().
If the max doesn't change very frequently, that's damn good, believe me.
However, don't take anything for granted. You have to struggle a bit to find the best combination between different data structures. Do your tests, learn how frequently max changes. Data structures aren't easy, and you can make a difference if you really work combining them instead of finding a magic one, that doesn't exist. :)
Cheers
Here's a degenerate answer. I noted that you hadn't specified what you consider "fast" for deletion; if O(n) is fast then the following will work. Make a class that wraps a HashSet; maintain a reference to the maximum element upon insertion. This gives the two constant time operations. For deletion, if the element you deleted is the maximum, you have to iterate through the set to find the maximum of the remaining elements.
This may sound like it's a silly answer, but in some practical situations (a generalization of) this idea could actually be useful. For example, you can still maintain the five highest values in constant time upon insertion, and whenever you delete an element that happens to occur in that set you remove it from your list-of-five, turning it into a list-of-four etcetera; when you add an element that falls in that range, you can extend it back to five. If you typically add elements much more frequently than you delete them, then it may be very rare that you need to provide a maximum when your list-of-maxima is empty, and you can restore the list of five highest elements in linear time in that case.
As already explained: for the general case, no. However, if your range of values are limited, you can use a counting sort-like algorithm to get O(1) insertion, and on top of that a linked list for moving the max pointer, thus achieving O(1) max and removal.

Categories

Resources