ArrayList: insertion vs insertion at specified element - java

Consider an Arraylist. Internally it is not full, and the number of elements inserted so far is known. The elements are not sorted.
Choose the operations listed below that are fast regardless of the number of elements contained in the ArrayList. (In other words, takes only several instructions to implement).
Insertion
Insertion at a given index
Getting the data from a specified index
Finding the maximum value in an array of integers (not necessarily sorted)
Deletion at the given index
Replacing an element at a specified index
Searching for a specific element
I chose Insertion at specified index, Getting the data from a specified index, and replacing an element but answer key says Insertion. As I usually understand it, in an ArrayList, the insert operation requires all of the elements to shift left. If we did this at the beginning of the list, we would have $O(n)$ time complexity. However, if we did it at the end, it would be $O(1)$.
My question comes down to: (1) what, if any, difference is there between insertion and insertion at specified index and (2) given this particular time complexity for insertion why is it considered "fast"

First take a look at these two methods defined in java.util.ArrayList
public boolean add(E e) {
ensureCapacityInternal(size + 1); // Increments modCount!!
elementData[size++] = e;
return true;
}
public void add(int index, E element) {
rangeCheckForAdd(index);
ensureCapacityInternal(size + 1); // Increments modCount!!
System.arraycopy(elementData, index, elementData, index + 1,
size - index);
elementData[index] = element;
size++;
}
Now if you see the first method (just adding element), it just ensures whether there's sufficient capacity and appends element to the last of the list.
So if there's sufficient capacity, then this operation would require O(1) time complexity, otherwise it would require shifting of all elements and ultimately time complexity increases to O(n).
In the second method, when you specify index, all the elements after that index would be shifted and this would definitely take more time then former.

For the first question the answer is this:
Insertion at a specified index i takes O(n), since all the elements following i will have to be shifted to right with one position.
On the other hand, simple insertion as implemented in Java (ArrayList.add()) will only take O(1) because the element is appended to the end of the array, so no shift is required.
For the second question, it is obvious why simple insertion is fast: no extra operation is needed, so time is constant.

ArrayList internally is nothing but an Array itself which uses Array.copyOf to create a new Array with increased size,upon add,but with original content intact.
So about insertion, whether you do a simple add (which will add the data at the end of the array) or on ,say, first(0th) index , it will still be faster then most data structures , keeping in mind the simplicity of the Data Structures.
The only difference is that simple add require no traversal but adding at index require shifting of elements to the left, similarly for delete. That uses System.arrayCopy to copy one array to another with alteration in index and the data.
So ,yeah simple insertion is faster then indexed insertion.

(1) what, if any, difference is there between insertion and insertion at specified index and
An ArrayList stores it's elements consecutively. Adding to the end of the ArrayList does not require the ArrayList to be altered in any way except for adding the new element to the end of itself. Thus, this operation is O(1), taking constant time which is favorable when wanting to perform an action repetitively in a data structure.
Adding an element to an index, however, requires the ArrayList to make room for the element in some way. How is that done? Every element following the inserted element will have to be moved one step to make room for the new insertion. Your index is anything in between the first element and and the nth element (inclusively). This operation thus is O(1) at best and O(n) at worst where n is the size of the array. For large lists, O(n) takes significantly longer time than O(1).
(2) given this particular time complexity for insertion why is it considered "fast"
It is considered fast because it is O(1), or constant time. If the time complexity is truly only one operation, it is as fast as it can possibly be, other small constants are also regarded fast and are often equally notated by O(1), where the "1" does not mean one single operation strictly, but that the amount of operations does not depend on the size of something else, in your example it would be the size of the ArrayList. However, constant time complexity can involve large constants as well, but in general is regarded as the fastest as possible time complexity. To put this into context, an O(1) operations takes roughly 1 * k operations in an ArrayList with 1000 elements, while a O(n) operation takes roughly 1000 * k operations, where k is some constant.
Big-O notation is used as a metric to measure how many operations an action or a whole programs will execute when they are run.
For more information about big O-notation:
What is a plain English explanation of "Big O" notation?

Related

Time complexity of LinkedList.subList(int, int)

If I have a linked list of objects and I want the sublist from index 2 to 5. Is this operation o(1)? All you would need to do is null the reference to prev on the node at index 2, and return the node at index 2, correct? Does this require copying the contents of the linked list into another one and returning that or just setting the head to be the node at index 2?
Is this operation o(1)?
In general, getting a sublist of a linked list is O(k), not O(1)*.
However, for any specific index value, such as 2, 5, or 5000, any operation is O(1), because a specific index becomes a constant that gets factored out from Big-O notation.
* Construction of sublist may be optimized such that you pay construction costs on the first navigation of sublist, rather on construction. In other words, constructing a sublist without iterating it is O(1).
It appears that the sublist method runs in O(1) time. See the source code for the method.
All that this code does is return a new instance of SubList that is initialized with the list that sublist is invoked upon. No iteration is happening here, so the operation runs in constant time.
It's O(n) if you consider the general case of the algorithm. Even if you do what you said finding the n-th and m-th element would take a complete traversal of the list.
It's wrong to consider that the finding the sublist from 2 to 5 would be O(1). It is O(1) ofcourse but it needs constant amount of operations to do that but are you creating an algorithm for sublist(2,5)? If you do that then ofcoursse it is always O(1).
A better example is sorting 100 numbers is of O(1) complexity so is sorting 10,000 numbers. But it is not what we are concerned about. We want to know the nature of the algorithm based on the inputs fed to that algorithm.

Why is a Hash Table with linked lists considered to have constant time complexity?

In my COMP class last night we learned about hashing and how it generally works when trying to find an element x in a hash table.
Our scenario was that we have a dataset of 1000 elements inside our table and we want to know if x is contained within that table.
Our professor drew up a Java array of 100 and said that to store these 1000 elements that each position of the array would contain a pointer to a Linked List where we would keep our elements.
Assuming the hashing function perfectly mapped each of the 1000 elements to a value between 0 and 99 and stored the element at the position in the array, there would be 1000/100 = 10 elements contained within each linked list.
Now to know whether x is in the table, we simply hash x, find it's hash value, lookup into the array at that slot and iterate over our linked list to check whether x is in the table.
My professor concluded by saying that the expected complexity of finding whether x is in the table is O(10) which is really just O(1). I cannot understand how this is the case. In my mind, if the dataset is N and the array size is n then it takes on average N/n steps to find x in the table. Isn't this by definition not constant time, because if we scale up the data set the time will still increase?
I've looked through Stack Overflow and online and everyone says hashing is expected time complexity of O(1) with some caveats. I have read people discuss chaining to reduce these caveats. Maybe I am missing something fundamental about determining time complexity.
TLDR: Why does it take O(1) time to lookup a value in a hash table when it seems to still be determined by how large your dataset is (therefore a function of N, therefore not constant).
In my mind, if the dataset is N and the array size is n then it takes on average N/n steps to find x in the table.
This is a misconception, as hashing simply requires you to calculate the correct bucket (in this case, array index) that the object should be stored in. This calculation will not become any more complex if the size of the data set changes.
These caveats that you speak of are most likely hash collisions: where multiple objects share the same hashCode; these can be prevented with a better hash function.
The complexity of a hashed collection for lookups is O(1) because the size of lists (or in Java's case, red-black trees) for each bucket is not dependent on N. Worst-case performance for HashMap if you have a very bad hash function is O(log N), but as the Javadocs point out, you get O(1) performance "assuming the hash function disperses the elements properly among the buckets". With proper dispersion the size of each bucket's collection is more-or-less fixed, and also small enough that constant factors generally overwhelm the polynomial factors.
There is multiple issues here so I will address them 1 by 1:
Worst case analysis vs amortized analysis:
Worst case analysis refers to the absolute worst case scenario that your algorithm can be given relative to running time. As an example, if I am giving an array of unordered elements, and I am told to find an element in it, my best case scenario is when the element is at index [0] the worst possible thing that I can be given is when the element is at the end of the array, in which case if my data set is n, I run n times before finding the element. On the average case however the element is anywhere in the array so I will run n-k steps (where k is the number of elements after the element I am looking for in the array).
Worst case analysis of Hashtables:
There exists only 1 kind of Hashtable that has guaranteed constant time access O(1) to it's elements, Arrays. (And even then it's not actually true do to paging and the way OS's handle memory). The worst possible case that I could give you for a hash table is a data set where every element hashes to the same index. So for example if every single element hashes to index 1, due to collisions, the worst case running time for accessing a value is O(n). This is unavoidable, hashtables always have this behaviour.
Average and best case scenario of hashtables:
You will rarely be given a set that gives you the worst possible case scenario. In general you can expect objects to be hashed to different indexes in your hashtable. Ideally the hash function hashes things in a very spread out manner so that objects get hashed to different indexes in the hash table.
In the specific example your teacher gave you, if 2 things get hashed to the same index, they get put in a linked list. So this is more or less how the table got constructed:
get element E
use the hashing function hash(E) to find the index i in the hash table
add e to the linjed list in hashTable[i].
repeat for all the elements in the data set
So now, let's say I want to find whether element E is on the table. Then:
do hash(E) to find the index i where E is potentially hashed
go to hashTable[i] and iterate through the linked list (up to 10 iterations)
If E is found, then E is in the Hash table, if E is not found, then E is not in the table
The reason why we can guarantee that E is not in the table if we can't find it, is because if it was, it would have been hashed to hashTable[i] so it HAS to be there, if it's on the table.

Time complexity in Java

For the method add of the ArrayList Java API states:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time.
I wonder if it is the same time complexity, linear, when using the add method of a LinkedList.
This depends on where you're adding. E.g. if in an ArrayList you add to the front of the list, the implementation will have to shift all items every time, so adding n elements will run in quadratic time.
Similar for the linked list, the implementation in the JDK keeps a pointer to the head and the tail. If you keep appending to the tail, or prepending in front of the head, the operation will run in linear time for n elements. If you append at a different place, the implementation will have to search the linked list for the right place, which might give you worse runtime. Again, this depends on the insertion position; you'll get the worst time complexity if you're inserting in the middle of the list, as the maximum number of elements have to be traversed to find the insertion point.
The actual complexity depends on whether your insertion position is constant (e.g. always at the 10th position), or a function of the number of items in the list (or some arbitrary search on it). The first one will give you O(n) with a slightly worse constant factor, the latter O(n^2).
In most cases, ArrayList outperforms LinkedList on the add() method, as it's simply saving a pointer to an array and incrementing the counter.
If the woking array is not large enough, though, ArrayList grows the working array, allocating a new one and copying the content. That's slower than adding a new element to LinkedList—but if you constantly add elements, that only happens O(log(N)) times.
When we talk about "amortized" complexity, we take an average time calculated for some reference task.
So, answering your question, it's not the same complexity: it's much faster (though still O(1)) in most cases, and much slower (O(N)) sometimes. What's better for you is better checked with a profiler.
If you mean the add(E) method (not the add(int, E) method), the answer is yes, the time complexity of adding a single element to a LinkedList is constant (adding n elements requires O(n) time)
As Martin Probst indicates, with different positions you get different complexities, but the add(E) operation will always append the element to the tail, resulting in a constant (amortized) time operation

picking without replacement in java

I often* find myself in need of a data structure which has the following properties:
can be initialized with an array of n objects in O(n).
one can obtain a random element in O(1), after this operation the picked
element is removed from the structure.
(without replacement)
one can undo p 'picking without replacement' operations in O(p)
one can remove a specific object (eg by id) from the structure in O(log(n))
one can obtain an array of the objects currently in the structure in
O(n).
the complexity (or even possibility) of other actions (eg insert) does not matter. Besides the complexity it should also be efficient for small numbers of n.
Can anyone give me guidelines on implementing such a structure? I currently implemented a structure having all above properties, except the picking of the element takes O(d) with d the number of past picks (since I explicitly check whether it is 'not yet picked'). I can figure out structures allowing picking in O(1), but these have higher complexities on at least one of the other operations.
BTW:
note that O(1) above implies that the complexity is independent from #earlier picked elements and independent from total #elements.
*in monte carlo algorithms (iterative picks of p random elements from a 'set' of n elements).
HashMap has complexity O(1) both for insertion and removal.
You specify a lot of operation, but all of them are nothing else then insertion, removal and traversing:
can be initialized with an array of n objects in O(n).
n * O(1) insertion. HashMap is fine
one can obtain a random element in
O(1), after this operation the picked
element is removed from the structure.
(without replacement)
This is the only op that require O(n).
one can undo p 'picking without
replacement' operations in O(p)
it's an insertion operation: O(1).
one can remove a specific object (eg
by id) from the structure in O(log(n))
O(1).
one can obtain an array of the objects
currently in the structure in O(n).
you can traverse an HashMap in O(n)
EDIT:
example of picking up a random element in O(n):
HashMap map ....
int randomIntFromZeroToYouHashMapSize = ...
Collection collection = map.values();
Object[] values = collection.toArray();
values[randomIntFromZeroToYouHashMapSize];
Ok, same answer as 0verbose with a simple fix to get the O(1) random lookup. Create an array which stores the same n objects. Now, in the HashMap, store the pairs . For example, say your Objects (strings for simplicity) are:
{"abc" , "def", "ghi"}
Create an
List<String> array = ArrayList<String>("abc","def","ghi")
Create a HashMap map with the following values:
for (int i = 0; i < array.size(); i++)
{
map.put(array[i],i);
}
O(1) random lookup is easily achieved by picking any index in the array. The only complication that arises is when you delete an object. For that, do:
Find object in map. Get its array index. Lets call this index i (map.get(i)) - O(1)
Swap array[i] with array[size of array - 1] (the last element in the array). Reduce the size of the array by 1 (since there is one less number now) - O(1)
Update the index of the new object in position i of the array in map (map.put(array[i], i)) - O(1)
I apologize for the mix of java and cpp notation, hope this helps
Here's my analysis of using Collections.shuffle() on an ArrayList:
✔ can be initialized with an array of n objects in O(n).
Yes, although the cost is amortized unless n is known in advance.
✔ one can obtain a random element in O(1), after this operation the picked element is removed from the structure, without replacement.
Yes, choose the last element in the shuffled array; replace the array with a subList() of the remaining elements.
✔ one can undo p 'picking without replacement' operations in O(p).
Yes, append the element to the end of this list via add().
❍ one can remove a specific object (eg by id) from the structure in O(log(n)).
No, it looks like O(n).
✔ one can obtain an array of the objects currently in the structure in O(n).
Yes, using toArray() looks reasonable.
How about an array (or ArrayList) that's divided into "picked" and "unpicked"? You keep track of where the boundary is, and to pick, you generate a random index below the boundary, then (since you don't care about order), swap the item at that index with the last unpicked item, and decrement the boundary. To unpick, you just increment the boundary.
Update: Forgot about O(log(n)) removal. Not that hard, though, just a little memory-expensive, if you keep a HashMap of IDs to indices.
If you poke around on line you'll find various IndexedHashSet implementations that all work on more or less this principle -- an array or ArrayList plus a HashMap.
(I'd love to see a more elegant solution, though, if one exists.)
Update 2: Hmm... or does the actual removal become O(n) again, if you have to either recopy the arrays or shift them around?

Which is the appropriate data structure?

I need a Java data structure that has:
fast (O(1)) insertion
fast removal
fast (O(1)) max() function
What's the best data structure to use?
HashMap would almost work, but using java.util.Collections.max() is at least O(n) in the size of the map. TreeMap's insertion and removal are too slow.
Any thoughts?
O(1) insertion and O(1) max() are mutually exclusive together with the fast removal point.
A O(1) insertion collection won't have O(1) max as the collection is unsorted. A O(1) max collection has to be sorted, thus the insert is O(n). You'll have to bite the bullet and choose between the two. In both cases however, the removal should be equally fast.
If you can live with slow removal, you could have a variable saving the current highest element, compare on insert with that variable, max and insert should be O(1) then. Removal will be O(n) then though, as you have to find a new highest element in the cases where the removed element was the highest.
If you can have O(log n) insertion and removal, you can have O(1) max value with a TreeSet or a PriorityQueue. O(log n) is pretty good for most applications.
If you accept that O(log n) is still "fast" even though it isn't "fast (O(1))", then some kinds of heap-based priority queue will do it. See the comparison table for different heaps you might use.
Note that Java's library PriorityQueue isn't very exciting, it only guarantees O(n) remove(Object).
For heap-based queues "remove" can be implemented as "decreaseKey" followed by "removeMin", provided that you reserve a "negative infinity" value for the purpose. And since it's the max you want, invert all mentions of "min" to "max" and "decrease" to "increase" when reading the article...
you cannot have O(1) removal+insertion+max
proof:
assume you could, let's call this data base D
given an array A:
1. insert all elements in A to D.
2. create empty linked list L
3. while D is not empty:
3.1. x<-D.max(); D.delete(x); --all is O(1) - assumption
3.2 L.insert_first(x) -- O(1)
4. return L
in here we created a sorting algorithm which is O(n), but it is proven to be impossible! sorting is known as omega(nlog(n)). contradiction! thus, D cannot exist.
I'm very skeptical that TreeMap's log(n) insertion and deletion are too slow--log(n) time is practically constant with respect to most real applications. Even with a 1,000,000,000 elements in your tree, if it's balanced well you will only perform log(2, 1000000000) = ~30 comparisons per insertion or removal, which is comparable to what any other hash function would take.
Such a data structure would be awesome and, as far as I know, doesn't exist. Others pointed this.
But you can go beyond, if you don't care making all of this a bit more complex.
If you can "waste" some memory and some programming efforts, you can use, at the same time, different data structures, combining the pro's of each one.
For example I needed a sorted data structure but wanted to have O(1) lookups ("is the element X in the collection?"), not O(log n). I combined a TreeMap with an HashMap (which is not really O(1) but it is almost when it's not too full and the hashing function is good) and I got really good results.
For your specific case, I would go for a dynamic combination between an HashMap and a custom helper data structure. I have in my mind something very complex (hash map + variable length priority queue), but I'll go for a simple example. Just keep all the stuff in the HashMap, and then use a special field (currentMax) that only contains the max element in the map. When you insert() in your combined data structure, if the element you're going to insert is > than the current max, then you do currentMax <- elementGoingToInsert (and you insert it in the HashMap).
When you remove an element from your combined data structure, you check if it is equal to the currentMax and if it is, you remove it from the map (that's normal) and you have to find the new max (in O(n)). So you do currentMax <- findMaxInCollection().
If the max doesn't change very frequently, that's damn good, believe me.
However, don't take anything for granted. You have to struggle a bit to find the best combination between different data structures. Do your tests, learn how frequently max changes. Data structures aren't easy, and you can make a difference if you really work combining them instead of finding a magic one, that doesn't exist. :)
Cheers
Here's a degenerate answer. I noted that you hadn't specified what you consider "fast" for deletion; if O(n) is fast then the following will work. Make a class that wraps a HashSet; maintain a reference to the maximum element upon insertion. This gives the two constant time operations. For deletion, if the element you deleted is the maximum, you have to iterate through the set to find the maximum of the remaining elements.
This may sound like it's a silly answer, but in some practical situations (a generalization of) this idea could actually be useful. For example, you can still maintain the five highest values in constant time upon insertion, and whenever you delete an element that happens to occur in that set you remove it from your list-of-five, turning it into a list-of-four etcetera; when you add an element that falls in that range, you can extend it back to five. If you typically add elements much more frequently than you delete them, then it may be very rare that you need to provide a maximum when your list-of-maxima is empty, and you can restore the list of five highest elements in linear time in that case.
As already explained: for the general case, no. However, if your range of values are limited, you can use a counting sort-like algorithm to get O(1) insertion, and on top of that a linked list for moving the max pointer, thus achieving O(1) max and removal.

Categories

Resources