Traversal of Giant LinkedList - java

For a project I am required to write a method that times the traversal of a LinkedList filled with 5 million random Integers using a listIterator, then with LinkedList's get(index) method.
I had no problem traversing it with the listIterator and it completed in around 75ms. HOWEVER, after trying the get method traversal on 5 million Integers, I just stopped the run at around 1.5 hours.
The getTraverse method I used is something like the code below for example (however mine was grouped with other methods in a class and was non-static, but works the same way).
public static long getTraverse(LinkedList<Integer> list) {
long start = System.currentTimeMillis();
for (int i = 0; i < linkedList.size(); i++) {
linkedList.get(i);
}
long stop = System.currentTimeMillis();
return stop - start;
}
This worked perfectly fine for LinkedLists of Integers of sizes 50, 500, 5000, 50000, and took quite a while but completed for 500000.
My professor tends to be extremely vague with instructions and very unhelpful when approached with questions. So, I don't know if my code is broken, or if he got carried away with the Integers in the guidelines. Any input is appreciated.

Think about how a LinkedList is implemented - as a chain of nodes - and you'll see that to get to a particular node you have to start at the head and traverse to that node.
You're calling .get() on a LinkedList n times, which requires traversing the list until it reaches that index. This means your getTraverse() method takes O(n^2) (or quadratic) time, because for each element it has to traverse (part of) the list.
As Elliott Frisch said, I suspect you're discovering exactly what your instructor wanted you to discover - that different algorithms can have drastically different runtimes, even if in principle they do the same thing.

A LinkedList is optimised for insertion, which is a constant time operation.
Searching a LinkedList requires you to iterate over every element to find the one you want. You provide the index to the get method, but under the covers it is traversing the list to that index.
If you add some print statements, you'll probably see that the first X elements are retrieved pretty fast and it slows down over time as you index elements further down the list.
An ArrayList (backed by an array) is optimised for retrieval and can index the desired element in constant time. Try changing your code to use an ArrayList and see how much faster get runs.

Related

Question(s) about time complexity of array "resizing" in Java

NOTE: As the title already hints, this question is not about the specific java.util.ArrayList implementation of an array-based list, but rather about the raw arrays themselves and how they might behave in a "pure" (meaning completely unoptimized) array-based list implementation. I chose to mention java.util.ArrayList because it is the most prominent example of an array-based list in Java, although it is technically not "pure", as it utilizes preallocation to reduce the operation time of add(). If you want to know why I am asking this specific question without being interested in the java.util.ArrayList() preallocation optimization, I added a little explanation of my use case below.
It is generally known that you can access elements in array-based lists (like Java's ArrayList<E>) with a time complexity of O(1), while adding elements to that list will take O(n). With linked lists, it is the other way round (for a doubly linked list, you could optimize the access to half the execution time).
The reason why adding elements to an array-based list takes O(n) is that an array cannot simply be resized, but has to be reallocated and re-filled. The easiest way to do this would be:
String arr[] = new String[n];
//...
String newElem = "foo";
String[] newArr = new String[n + 1];
int i = 0;
for (String elem : arr) {
newArr[i] = arr[i++];
}
newArr[i] = newElem;
arr = newArr;
The time complexity O(n) is clearly visible thanks to the for loop. But there are other ways to copy arrays in Java, for example System.arraycopy().
Sticking to the vanilla for loop solution, even shrinking an array will take O(n), because an array has a fixed size and in order to "shrink" it, you'd have to copy all elements to be retained to a new, smaller array.
So, here are my questions concerning such array operations and their time complexity:
While the vanilla for loop will always take O(n), is it possible that System.arraycopy() optimizes the "add" operation if there is enough space in the memory to expand the array in place, meaning that it would leave the original array at its place and just add the new element at the end of it?
As the shrinking operation could always be executed with O(1) in theory, does System.arraycopy() always optimize this operation to O(1)?
If System.arraycopy() is not capable of using those optimizations, is there any other way in Java to actually utilize those optimizations which are possible in theory OR will array "resizing" always take O(n), no matter under which circumstances?
TL;DR is there any situation in which the "resizing" of an array in Java will take less than O(n)?
Additional information:
I am using openJDK11 (newest release), but if the answer turns out to be JVM-dependent, I'd like to know how other JVMs would behave in comparison.
For the curious ones
who want to know what I want to do with this information:
I am working on a new java.util.List implementation, namely a hybrid list that can store data in an array and in a linked buffer. On certain occasions, the buffer will be flushed into the array, which of course requires that the existing array is resized. But apart from this idea, I want to utilize as many other optimizations on the array part as possible. To avoid array resizing in general, I experimented with the idea of letting the array persist in a constant size, but managing the "valid" range of it with some other fields. Meaning that if you were to pop the last element of the array, it would not shrink the array but rather the range of valid elements. Then, when inserting new elements in the array part, the former invalid section can be used to shift values into, basically reusing the space that was formerly used by a now deleted element. If the inserting operations exceed the actual array size, elements can still be transferred to the linked buffer to avoid resizing. To further optimize this, I chose to use the middle of the array as a pivot when deleting certain elements. Now the valid range might not start at the beginning of the array anymore. Basically this means if you delete an element to the left of the pivot, all elements between the start of the valid range and the deleted element get shifted towards the pivot, to the right. Removing element to the right of the pivot works accordingly. So, after some removals, the array could look like this:
[null null|elem0 elem1 elem2||elem3 elem4 elem5|null null null]
(Where the | at the beginning and at the end mark the valid range and the || marks the pivot)
So, how is this all related to my question?
All of those optimizations build up upon the claim that array resizing is expensive in time, namely O(n). Therefore array resizing is avoided whenever possible. Those optimizations might sound neat, but the code implementing them can get quite messy, especially when implementing the batch operations (addAll(), removeAll(), retainAll()...). So, if it turns out that the array resizing operation itself can be less expensive in some cases (especially shrinking), I would cut out a lot of those optimizations which are then rendered useless, making the code a lot easier in the process.
So, before sticking to my optimization ideas and experiments, I'd like to know whether they are even needed.

Algorithm Complexity: Is it the same to iterate an array from the start than from the end?

In an interview I was asked for the following:
public class Main {
public static void main(String[] args) {
// TODO Auto-generated method stub
int [] array = new int [10000];
for (int i = 0; i < array.length; i++) {
// do calculations
}
for (int x = array.length-1; x >= 0; x--) {
// do calculations
}
}
}
Is it the same to iterate an array either from the end or from the start? As my understanding it would be the same since complexity is constant i.e O(1) ? Am I correct?
Also I was asked regarding ArrayList Complexity compared to other collections in java, for example, LinkedList.
Thank you.
There can be a difference due to CPU prefetch characteristics.
There is no difference between looping in either direction as per computational theory. However, depending on the kind of prefetcher that is used by the CPU on which the code runs, there will be some differences in practice.
For example, the Sandy Bridge Intel processor has a prefetcher that goes forward only for data (while instructions could be prefetched in both directions). This will help iteration from the start (as future memory locations are prefetched into the L1 cache), while iterating from the end will cause very little to no prefetching, and hence more accesses to RAM which is much slower than accessing any of the CPU caches.
There is a more detailed discussion about forward and backward prefetching at this link.
It's O(n) in both cases for an array, as there are n iterations and each step takes O(1) (assuming the calculations in the loop take O(1)). In particular, obtaining the length or size is typically an O(1) operation for arrays or ArrayList.
A typical use case for iterating from the end is removing elements in the loop (which may otherwise require more complex accounting to avoid skipping elements or iterating beyond the end).
For a linked list, the first loop would typically be O(n²), as determining the length of a linked list is typically an O(n) operation without additional caching, and it's used every time the exit condition is checked. However, java.util.LinkedList keeps explicitly track of the length, so the total is O(n) for linked lists in java.
If an element in a linked list is accessed using the index in the calculations, this will be an O(n) operation, yielding a total of O(n²).
Is it the same to iterate an array either from the end or from the start? As my understanding it would be the same since complexity is constant i.e O(1) ? Am I correct?
In theory, yes it's the same to iterate an array from the end and from the start.
The time complexity is O(10,000) which is constant so O(1) assuming that the loop body has a constant time complexity. But it's nice to mention that the constant 10,000 can be promoted to a variable, call it N, and then you can say that the time complexity is O(N).
Also I was asked regarding ArrayList Complexity compared to other collections in java, for example, LinkedList.
Here you can find comparison between ArrayList and LinkedList time complexity. The interesting methods are add, remove and get.
http://www.programcreek.com/2013/03/arraylist-vs-linkedlist-vs-vector/
Also, data in a LinkedList are not stored consuecutively. However, data in an ArrayList are stored consecutively and also an ArrayList uses less space than a LinkedList
Good luck!

Getting max and min from BST (C++ vs Java)

I know that given a balanced binary search tree, getting the min and max element takes O(logN), I want to ask about their implementation in C++ and Java respectively.
C++
Take std::set for example, getting min/max can be done by calling *set.begin() / *set.rbegin() , and it's constant time.
Java
Take HashSet for example, getting min/max can be done by calling HashSet.first() and HashSet.last(), but it's logarithmic time.
I wonder if this is because std::set has done some extra trick to always update the beigin() and rbegin() pointer, if so, can anyone show me that code? Btw why didn't Java add this trick too, it seems pretty convenient to me...
EDIT:
My question might not be very clear, I want to see how std::set implements insert/erase , I'm curious to see how the begin() and rbegin() iterator are updated during those operations..
EDIT2:
I'm very confused now. Say I have following code:
set<int> s;
s.insert(5);
s.insert(3);
s.insert(7);
... // say I inserted a total of n elements.
s.insert(0);
s.insert(9999);
cout<<*s.begin()<<endl; //0
cout<<*s.rbegin()<<endl; //9999
Isn't both *s.begin() and *s.rbegin() O(1) operations? Are you saying they aren't? s.rbegin() actually iterate to the last element?
My answer isn't language specific.
To fetch the MIN or MAX in a BST, you need to always iterate till the leftmost or the rightmost node respectively. This operation in O(height), which may roughly be O(log n).
Now, to optimize this retrieval, a class that implements Tree can always store two extra pointers pointing to the leftmost and the rightmost node and then retrieving them becomes O(1). Off course, these pointers brings in the overhead of updating them with each insert operation in the tree.
begin() and rbegin() only return iterators, in constant time. Iterating them isn't constant-time.
There is no 'begin()/rbegin() pointer'. The min and max are reached by iterating to the leftmost or rightmost elements, just as in Java, only Java doesn't have the explicit iterator.

How can we reduce search time in linked list?

I read this question on career cup but didn't find any good answer other than 'SkipList'. The description of SkipList that I found on wikipedia was interesting, however, I didn't understand some terms like 'geometric/binomial distrubution'... I read what it is and goes deep into probabilistic theory. I simply wanted to implement a way to make some searching quicker. So here's what I did:
1. Created indexes.
- I wrote a function to create say 1000 nodes. Then, I created an array of type linked list and looped through the 1000 nodes and picked every 23rd element (random number that came in my mind) and added to the array which I call 'index'.
SLL index = new SLL[50]
Now the function to to create the index:
private static void createIndex(SLL[] index, SLL head){
int count=0;
SLL temp = head;
while(temp!=null)
{
count++;
temp = temp.next;
if((count==23){
index[i] = temp;
i++;
count=0;
}
}
}
Now finally the 'find' function. In that function, I first take the input element say 769 for example. I go through the 'index' array and find index[i]>769. Thus, now I pass head = index[i-1] and tail = index[i] to the 'find' function. It will then search between a short range of 23 elements for 769. Thus, I calculated that it takes a total of 43 jumps (including the array jumps and the node=node.next jumps) to find the element I wanted which otherwise would have taken 769 jumps.
Please Note: I consider the code to create index array NOT a part of searching, thus I do not add its time complexitiy(which is terrible) with the 'find' function's time complexity. I assume that this creation of index should be done as a separate function after a list has been created, OR, do it timely. Just like it takes time for a webpage to show up on google searches.
Also, this question was asked in a Microsoft interview and I wonder if the solution I provided would be any good or would I look like a fool for providing such kind of a solution. The solution has been written in Java.
Waiting for your feedback.
It is difficult to make out what problem it is that you are trying to solve here, or how your solution is supposed to work. (Hint: complete working code would help with both!)
However there are a couple of general things we can say:
You can't search a list data structure (e.g. find i in the list) in better that O(N) unless some kind of ordering has been placed on it. For example, sorting the elements.
If the elements of the list are sorted and your list is indexable (i.e. getting the element at position i is O(1)), then you can use binary search and find an element in O(logN).
You can't get the element at position i of a linked list in better that O(N).
If you add secondary data (indexes, whatever), you can potentially get better performance for certain operations ... at the expense of more storage space, and making certain other operations more expensive. However, you no longer have a list / linked list. The entire data structure is "something else".

Time complexity in Java

For the method add of the ArrayList Java API states:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time.
I wonder if it is the same time complexity, linear, when using the add method of a LinkedList.
This depends on where you're adding. E.g. if in an ArrayList you add to the front of the list, the implementation will have to shift all items every time, so adding n elements will run in quadratic time.
Similar for the linked list, the implementation in the JDK keeps a pointer to the head and the tail. If you keep appending to the tail, or prepending in front of the head, the operation will run in linear time for n elements. If you append at a different place, the implementation will have to search the linked list for the right place, which might give you worse runtime. Again, this depends on the insertion position; you'll get the worst time complexity if you're inserting in the middle of the list, as the maximum number of elements have to be traversed to find the insertion point.
The actual complexity depends on whether your insertion position is constant (e.g. always at the 10th position), or a function of the number of items in the list (or some arbitrary search on it). The first one will give you O(n) with a slightly worse constant factor, the latter O(n^2).
In most cases, ArrayList outperforms LinkedList on the add() method, as it's simply saving a pointer to an array and incrementing the counter.
If the woking array is not large enough, though, ArrayList grows the working array, allocating a new one and copying the content. That's slower than adding a new element to LinkedList—but if you constantly add elements, that only happens O(log(N)) times.
When we talk about "amortized" complexity, we take an average time calculated for some reference task.
So, answering your question, it's not the same complexity: it's much faster (though still O(1)) in most cases, and much slower (O(N)) sometimes. What's better for you is better checked with a profiler.
If you mean the add(E) method (not the add(int, E) method), the answer is yes, the time complexity of adding a single element to a LinkedList is constant (adding n elements requires O(n) time)
As Martin Probst indicates, with different positions you get different complexities, but the add(E) operation will always append the element to the tail, resulting in a constant (amortized) time operation

Categories

Resources