Stream of numbers and best space complexity to find n/2th element - java

I was trying to solve this problem where a stream of numbers of length not more than M will be given. You don't know the exact length of the stream but are sure that it wont exceed M. At the end of the stream, you have to tell the N/2th element of the stream, considering that N elements came in the stream. what would be best space complexity with which you can solve this problem
my solution:
i think we can take a queue of size m/2 , and push two element , then pop 1 element and keep on till
stream is over . The n/2th will be at head of queue then. Time complexity will be min O(n) for any way , but for this approach,space complexity is m/2 .. is there any better solution?

I hope it is obvious that you will need at least N/2 memory allocation (Unless you can re-iterate through your steam, reading the same data again) . Your algorithm uses M/2, given the fact that N is upper bounded by M would make it look like it doesn't matter which you will choose, since N can go up to M.
But it doesn't have to. If you consider N being way smaller than M (for example N=5 and M=1 000 000) then you would waste a lot of resources.
I would recommend some dynamic growth array structure, something like ArrayList, but that is not good for removing first element.
Conclusion: You can have O(N) both time and memory complexity, and you can't get any better.
Friendly edit regarding ArrayList: adding an element to an ArrayList is in "amortized constant time", so adding N items is O(N) in time. Removing them, however, is linear (per JavaDoc) so you can definitely get O(N) in time and space but ONLY IF you don't remove anything. If you do remove, you get O(N) in space (O(N/2) = O(N)), but your time complexity goes up.

Do you know the "tortoise and hare" algorithm? Start with two pointers to the beginning of the input. Then at each step advance the hare two elements and the tortoise one element. When the hare reaches the end of the input the tortoise is at the midpoint. This is O(n) time, since it visits each element of the input once, and O(1) space, since it keeps exactly two pointers regardless of the size of the input.

Related

Worst time complexity of Linear search O(n)

I am trying to write a report where I evaluate the time complexity of an algorithm I have designed, I know for sure that it's complexity is O(n). From what I got from Wikipedia, the best case would be O(1), if I have understood correctly it means that the best case is when the ArrayList I am using only contains one element, but I don't get the worst case completely, what does "O(1) iterative" mean and how can it occur?
In a comment you write:
In my case I am not looking for an element of the list in particular, but I need to check if every single element's attribute is true or false.
This is not a linear search. Searching (linear or otherwise) is answering the question "is there at least one matching element". Your question is "do all elements match".
I would always need to go thought the whole list fro mthe first to the last element, so what would be the worst and best case.
The best case is still O(1). If you find that one of the element's attribute is false, you can terminate the scan immediately. The best case is when that happens for the first element ....
Consider this. Checking that "all elements are true" is equivalent to check that "NOT (some element is false)".
The reason it's O(1) best case is not JUST for a list with 1 element (although this would be the case in that scenario too). Imagine you have a list of 10 numbers.
[44,6,1,2,6,10,93,187,33,55]
Let's say we run Linear Search and are searching for the integer 44. Since it's the first element in the list, our time complexity is O(1), the best case scenario, because we only have to search 1 element out of the entire list before we find what we're looking for.
Let's look at a varient of that list.
[55,6,1,2,6,10,93,187,33,44]
In this case we swapped the first and last numbers. So when we run Linear Search for the integer 44 it will be a time complexity of O(n), the worst case, since we have to traverse the entire list of n elements before we find our desired element (if it even exists in the list, in our case is does).
Regarding the "O(1) iterative" on Wikipedia, I wouldn't let it confuse you. Also notice that it's referring to space complexity on the Wikipedia page, and not time complexity performance. We don't need any extra space to store anything during Linear Search, we just compare our desired value (such as 44 in the example) with the elements in the array one by one, so we have a space complexity of O(1).
EDIT: Based upon your comment:
In my case I am not looking for an element of the list in particular
Keep in mind "Linear Search" is a particular algorithm with a specific purpose of finding a particular element in a list, which you mention is NOT what you're trying to do. It doesn't seem Linear Search is what you're looking for. Linear Search is given an array/list and a desired element. It will return the index of where the desired element occurs in the list, assuming it does exist in the list.
I would always need to go thought the whole list fro mthe first to the
last element
From your comment description, I believe you're just trying to traverse a list from beginning to end, always. This would be O(N) always, since you are always traversing the entire list. Consider this simple Python example:
L1 = [1,2,3,4,5,6,7,8,9,10] #Size n, where n = 10
for item in L1:
print(item)
This will just print every item in the list. Our list is of size n. So the time complexity of the list traversal is O(n). This only applies if you want to traverse the entire list every time.

Getting max and min from BST (C++ vs Java)

I know that given a balanced binary search tree, getting the min and max element takes O(logN), I want to ask about their implementation in C++ and Java respectively.
C++
Take std::set for example, getting min/max can be done by calling *set.begin() / *set.rbegin() , and it's constant time.
Java
Take HashSet for example, getting min/max can be done by calling HashSet.first() and HashSet.last(), but it's logarithmic time.
I wonder if this is because std::set has done some extra trick to always update the beigin() and rbegin() pointer, if so, can anyone show me that code? Btw why didn't Java add this trick too, it seems pretty convenient to me...
EDIT:
My question might not be very clear, I want to see how std::set implements insert/erase , I'm curious to see how the begin() and rbegin() iterator are updated during those operations..
EDIT2:
I'm very confused now. Say I have following code:
set<int> s;
s.insert(5);
s.insert(3);
s.insert(7);
... // say I inserted a total of n elements.
s.insert(0);
s.insert(9999);
cout<<*s.begin()<<endl; //0
cout<<*s.rbegin()<<endl; //9999
Isn't both *s.begin() and *s.rbegin() O(1) operations? Are you saying they aren't? s.rbegin() actually iterate to the last element?
My answer isn't language specific.
To fetch the MIN or MAX in a BST, you need to always iterate till the leftmost or the rightmost node respectively. This operation in O(height), which may roughly be O(log n).
Now, to optimize this retrieval, a class that implements Tree can always store two extra pointers pointing to the leftmost and the rightmost node and then retrieving them becomes O(1). Off course, these pointers brings in the overhead of updating them with each insert operation in the tree.
begin() and rbegin() only return iterators, in constant time. Iterating them isn't constant-time.
There is no 'begin()/rbegin() pointer'. The min and max are reached by iterating to the leftmost or rightmost elements, just as in Java, only Java doesn't have the explicit iterator.

Find K max values from a N List

I got requirements-
1. Have random values in a List/Array and I need to find 3 max values .
2. I have a pool of values and each time this pool is getting updated may be in every 5 seconds, Now every time after the update , I need to find the 3 max Values from the list pool.
I thought of using Math.max thrice on the list but I dont think it as
a very optimized approach.
> Won't any sorting mechanism be costly as I am bothered about only top
3 Max Values , why to sort all these
Please suggest the best way to do it in JAVA
Sort the list, get the 3 max values. If you don't want the expense of the sort, iterate and maintain the n largest values.
Maintain the pool is a sorted collection.
Update: FYI Guava has an Ordering class with a greatestOf method to get the n max elements in a collection. You might want to check out the implementation.
Ordering.greatestOf
Traverse the list once, keeping an ordered array of three largest elements seen so far. This is trivial to update whenever you see a new element, and instantly gives you the answer you're looking for.
A priority queue should be the data structure you need in this case.
First, it would be wise to never say again, "I dont think it as a very optimized approach." You will not know which part of your code is slowing you down until you put a profiler on it.
Second, the easiest way to do what you're trying to do -- and what will be most clear to someone later if they are trying to see what your code does -- is to use Collections.sort() and pick off the last three elements. Then anyone who sees the code will know, "oh, this code takes the three largest elements." There is so much value in clear code that it will likely outweigh any optimization that you might have done. It will also keep you from writing bugs, like giving a natural meaning to what happens when someone puts the same number into the list twice, or giving a useful error message when there are only two elements in the list.
Third, if you really get data which is so large that O(n log n) operations is too slow, you should rewrite the data structure which holds the data in the first place -- java.util.NavigableSet for example offers a .descendingIterator() method which you can probe for its first three elements, those would be the three maximum numbers. If you really want, a Heap data structure can be used, and you can pull off the top 3 elements with something like one comparison, at the cost of making adding an O(log n) procedure.

Time complexity in Java

For the method add of the ArrayList Java API states:
The add operation runs in amortized constant time, that is, adding n elements requires O(n) time.
I wonder if it is the same time complexity, linear, when using the add method of a LinkedList.
This depends on where you're adding. E.g. if in an ArrayList you add to the front of the list, the implementation will have to shift all items every time, so adding n elements will run in quadratic time.
Similar for the linked list, the implementation in the JDK keeps a pointer to the head and the tail. If you keep appending to the tail, or prepending in front of the head, the operation will run in linear time for n elements. If you append at a different place, the implementation will have to search the linked list for the right place, which might give you worse runtime. Again, this depends on the insertion position; you'll get the worst time complexity if you're inserting in the middle of the list, as the maximum number of elements have to be traversed to find the insertion point.
The actual complexity depends on whether your insertion position is constant (e.g. always at the 10th position), or a function of the number of items in the list (or some arbitrary search on it). The first one will give you O(n) with a slightly worse constant factor, the latter O(n^2).
In most cases, ArrayList outperforms LinkedList on the add() method, as it's simply saving a pointer to an array and incrementing the counter.
If the woking array is not large enough, though, ArrayList grows the working array, allocating a new one and copying the content. That's slower than adding a new element to LinkedList—but if you constantly add elements, that only happens O(log(N)) times.
When we talk about "amortized" complexity, we take an average time calculated for some reference task.
So, answering your question, it's not the same complexity: it's much faster (though still O(1)) in most cases, and much slower (O(N)) sometimes. What's better for you is better checked with a profiler.
If you mean the add(E) method (not the add(int, E) method), the answer is yes, the time complexity of adding a single element to a LinkedList is constant (adding n elements requires O(n) time)
As Martin Probst indicates, with different positions you get different complexities, but the add(E) operation will always append the element to the tail, resulting in a constant (amortized) time operation

Java - Looking for something faster than PriorityQueue

i'm using java on a big amount of data.
[i try to simplify the problem as much as possible]
Actually i have a small class (Element) containing an int KEY and a double WEIGHT (with getters&setters).
I read a lot of these objects from a file and i have to get the best (most weight) M objects.
Actually i'm using a PriorityQueue with a Comparator written to compare two Element, and it works, but it's too slow.
Do you know (i know you do) any faster way to do that?
Thank you
A heap-based priority queue is a good data structure for this problem. Just as a sanity check, verify that you are using the queue correctly.
If you want the highest weight items, use a min-queue—where the top of the heap is the smallest item. Adding every item to a max-queue and examining the top M items when done is not efficient.
For each item, if there are less than M items in the queue, add the current item. Otherwise, peek at the top of the heap. If it's less than the current item, discard it, and add the current item instead. Otherwise, discard the current item. When all items have been processed, the queue will contain the M highest-weight items.
Some heaps have shortcut APIs for replacing the top of the heap, but Java's Queue does not. Even so, the big-O complexity is the same.
In addition to the suggested "peek at the top of the heap" algorithm, which gives you O(n log m) complexity for getting the top-m of n items, here are two more solutions.
Solution 1: Use a Fibonacci heap.
The JDK's PriorityQueue implementation is a balanced binary heap. You should be able to squeeze more performance out of a Fibonacci heap implementation. It will have amortized constant time insert, while inserting into a binary heap has complexity Ω(log n) in the size of the heap. If you're doing that for every element, then you're at Ω(n log n). Finding the top-m of n items using a Fib heap has complexity O(n + m log n). Combine this with the suggestion to only ever insert m elements into the heap, and you have O(n + m log m), which is as close to linear time as you're going to get.
Solution 2: Traverse the list M times.
You should be able to get the kth-largest element in a set in O(n) time. Simply read everything into a list and do the following:
kthLargest(k, xs)
Pick a random pivot element p from the list
(the first one will do if your list is already random).
Go over the set once and group it into two lists.
Left: smaller than p.
Right: Larger or equal to p.
If the Right list is shorter than k, return kthLargest(k - right.size, Left)
If the Right list is longer than k, return kthLargest(k, right)
Otherwise, return p.
That gives you O(n) time. Running that m times, you should be able to get the top-m objects in your set in time O(nm), which will be strictly less than n log n for sufficiently large n and sufficiently small m. For example, getting the top-10 over a million items will take half as long as using a binary heap priority queue, all other things being equal.
If M is suitably small, then sorting all elements may waste a lot of computing time. You could only put the first M objects in priority queue (e.g. a heap, minimal element on top), and then iterate over the rest of the elements: every time an element is larger than the top of the heap, remove top and push new element into the heap.
Alternatively, you could iterate over the whole array once to find a statistical threshold value for which you can be very sure there are more than M objects with a larger value (will require some assumptions regarding the values, e.g. if they are normally distributed). You can then limit sorting to all elements with a larger value.
#Tnay: You have a point about not performing a comparison. Unfortunately, your example code still performs one. This solves the problem:
public int compare(ListElement i, ListElement j) {
return i.getValue() - j.getValue();
}
In addition, neither yours, nor BigGs compare method is strictly correct, since they never return 0. This may be a problem with some sorting algorithms, which is a very tricky bug, since it will only appear if you switch to another implementation.
From the Java docs:
The implementor must ensure that sgn(compare(x, y)) == -sgn(compare(y, x)) for all x and y.
This may or may not perform a significant constant factor speed-up.
If you combine this with erickson's solution, it will probably be hard to do it faster (depending on the size of M). If M is very large, the most efficient solution is probably to sort all the elements using Java's built-in qsort on an array and cut off one end of the array in the end.

Categories

Resources