I got requirements-
1. Have random values in a List/Array and I need to find 3 max values .
2. I have a pool of values and each time this pool is getting updated may be in every 5 seconds, Now every time after the update , I need to find the 3 max Values from the list pool.
I thought of using Math.max thrice on the list but I dont think it as
a very optimized approach.
> Won't any sorting mechanism be costly as I am bothered about only top
3 Max Values , why to sort all these
Please suggest the best way to do it in JAVA
Sort the list, get the 3 max values. If you don't want the expense of the sort, iterate and maintain the n largest values.
Maintain the pool is a sorted collection.
Update: FYI Guava has an Ordering class with a greatestOf method to get the n max elements in a collection. You might want to check out the implementation.
Ordering.greatestOf
Traverse the list once, keeping an ordered array of three largest elements seen so far. This is trivial to update whenever you see a new element, and instantly gives you the answer you're looking for.
A priority queue should be the data structure you need in this case.
First, it would be wise to never say again, "I dont think it as a very optimized approach." You will not know which part of your code is slowing you down until you put a profiler on it.
Second, the easiest way to do what you're trying to do -- and what will be most clear to someone later if they are trying to see what your code does -- is to use Collections.sort() and pick off the last three elements. Then anyone who sees the code will know, "oh, this code takes the three largest elements." There is so much value in clear code that it will likely outweigh any optimization that you might have done. It will also keep you from writing bugs, like giving a natural meaning to what happens when someone puts the same number into the list twice, or giving a useful error message when there are only two elements in the list.
Third, if you really get data which is so large that O(n log n) operations is too slow, you should rewrite the data structure which holds the data in the first place -- java.util.NavigableSet for example offers a .descendingIterator() method which you can probe for its first three elements, those would be the three maximum numbers. If you really want, a Heap data structure can be used, and you can pull off the top 3 elements with something like one comparison, at the cost of making adding an O(log n) procedure.
Related
Hi had an interview task, the idea is to store elements with fields: id, name, updateTime;
There should be methods add(Element), getElement(id), getLastUpdatedElements()
Requirements:
code should be on Java
Should be thread safe
Upper bound of computational complexiy for all these methods should be O(log(n))
Notes
Update time of any element can be changed in runtime
getLastUpdatedElements - returns updated last minute elements
My thoughts
I can not use CopyOnWriteArrayList because it will take O(N) to find last updated elements if the key is id, what breaks the requirement.
To fit O(log(N)) complexity with getLastUpdatedElements() I can use ConcurrentSkipListSet with comparator by updateTime but in that case it will take O(N) to get element by ID. (Please note that in this case add(Element) is O(log(N)) since we know updateTime for newly created elements)
I can use two trees, first one with comparator by id, second - with comparator by updateTime, but all access methods I should make synchronize what makes my programm single threaded
I think I'm close, just need to find how to get element with O(log(N)) but my thoughts are running out.
I hope I understood you correctly.
If you need to store the elements and have an "add" and "get" time as low as (log(N)), that sounds like classic hash map (which uses linked list hash and binary tree if search time reaches a certain threshold - since java 8 I believe).
so in the worst case it's log(N).
for the "get last updated" function: you can store each updated element in a stack (not really a stack, just a list you keep adding into) and when the function is performed. just perform a binary search on the list. when you reach the first item that has been updated in the last minute - just return the index to that item.
that way you only perform binary search (log(N)).
oh and of course just have a lock for those two data structures.
if you really want to dig into it performance-wise, you can implement two locks: one for inserting/updating entries, and one just for reading them.
similar to the "readers-writers problem" like so: https://www.tutorialspoint.com/readers-writers-problem
I am trying to compare 2 lists to each other. Both lists have tens of thousands of entries.
My idea so far has been to use 2 ArrayLists and comparing them element by element. However I have been told that comparing too much can corrupt eclipse. No idea if this is true. Though better safe than sorry.
If you know any tips on comparing tens of thousands of Strings, please let me know. Thanks.
You have to compare each individual elements in both arrays try sorting the array and then using a for loop to run through it
What does mean "corrupt eclipse"?
However no, if you have two arraylist and you want to know if they are equal, you have to compare all the variables unless you find difference or end of a list. The complexity is O(n) - linear - which is the best you can get without some pre processing (which itself would be O(n) in best case).
If you are comparing 2 numbers, the expected output can be -
-> first greater than second
-> first less than second
-> first equal to second
But, what do you mean by comparing 2 lists? Are you planning to compare the length of both the lists? If so, you get the size of both the lists and compare them.
If you want to compare every element in the list with the every other element, it would mean, you are trying to sort the list. If you are planning to do the same thing with 2 lists, it might mean- you are trying to merge-sort both the lists. Meaning, you are creating 1 list containing all the elements of both the lists in sorted order.
This following video can help you for understanding merge-sort.
https://www.youtube.com/watch?v=EeQ8pwjQxTM
This code would have an average time complexity of O(n log n). And for the lists with tens of thousands of elements, the algorithm would have considerable time complexity, but eclipse wouldn't corrupt. Worst case, if your code is not written properly, memory leaks can cause a problem in the JVM.
I hope my answer helps you. I might help you in a better way if your question is clearer and more specific.
More specifically, suppose I have an array with duplicates:
{3,2,3,4,2,2,1,4}
I want to have a data structure that supports search and remove the first occurrence of some value faster than O(n), say if the value is 4, then it becomes:
{3,2,3,2,2,1,4}
I also need to iterate the list from head according to the same order. Other operations like get(index) or insert are not needed.
You can use O(n) time to record the original data(say it's an int[]) in your data structure, I just need the later search and remove faster than O(n).
"Search and remove" is considered as ONE operation as shown above.
If I have to make it myself, I would use a LinkedList to store the data, and HashMap to map every key to a list of all occurrence of nodes together with their previous and next ones.
Is it a right approach? Are there any better choices already there in Java?
The data structure you describe, essentially a hybrid linked list and map, I think is the most efficient way of handling your stated problem. You'll have to keep track of the nodes yourself, since Java's LinkedList doesn't provide access to the actual nodes. The AbstractSequentialList may be helpful here.
The index structure you'll need is a map from an element value to the appearances of that element in the list. I recommend a hash table from hashCode % modulus to a linked list of (value, list of main-list nodes).
Note that this approach is still O(n) in the worst case, when you have universal hash collisions; this applies whether you use open or closed hashing. In the average case it should be something closer to O(ln(n)), but I'm not prepared to prove that.
Consider also whether the overhead of keeping track of all of this is really worth the gains. Unless you've actually profiled running code and determined that a LinkedList is causing problems because remove is O(n), stick with that until you do.
Since your requirement is that the first occurrence of the element should be removed and the remaining occurrences retained, there would be no way to do it faster than O(n) as you would definitely have to move through to the end of the list to find out if there is another occurrence. There is no standard api from Oracle in the java package that does this.
basically i'm looking for a best data structure in java which i can store pairs and retrieve top N number of element by the value. i'd like to do this in O(n) time where n is number of entires in the data structure.
example input would be,
<"john", 32>
<"dave", 3>
<"brian", 15>
<"jenna", 23>
<"rachael", 41>
and if N=3, i should be able to return rachael, john, jenna if i wanted descending order.
if i use some kind of hashMap, insertion is fast, but retrieving them by order gets expensive.
if i use some data structure that keeps things ordered, then insertion becomes expensive while retrieving is cheaper. i was not able to find the best data structure that can do both very well and very fast.
any input is appreciated. thanks.
[updated]
let me ask the question in other way if that make it clearer.
i know i can insert at constant time O(1) into hashMap.
now, how can i retrieve elements from sorted order by value in O(n) time where n=number of entires in the data structure? hope it makes sense.
If you want to sort, you have to give up constant O(1) time.
That is because unlike inserting an unsorted key / value pair, sorting will minimally require you to compare the new entry to something, and odds are to a number of somethings. Once you have an algorithm that will require more time with more entries (due to more comparisons) you have overshot "constant" time.
If you can do better, then by all means, do so! There is a Dijkstra prize awaiting for you, if not a Fields Medal to boot.
Don't dispair, you can still do the key part as a HashMap, and the sorting part with a Tree like implementation, that will give you O(log n). TreeMap is probably what you desire.
--- Update to match your update ---
No, you cannot iterate over a hashmap in O(n) time. To do so would assume that you had a list; but, that list would have to already be sorted. With a raw HashMap, you would have to search the entire map for the next "lower" value. Searching part of the map would not do, because the one element you didn't check would possibly be the correct value.
Now, there are some data structures that make a lot of trade offs which might get you closer. If you want to roll your own, perhaps a custom Fibonacci heap can give you an amortized performance close to what you wish, but it cannot guarantee a worst-case performance. In any case, some operations (like extract-min) will still require O(log n) performance.
I have about 200 lists of the kind (index , float) and I want to calculate the mean between them, I know the way with the complexity time of O(first Array size + ... + last Array size) is there any solution to calculate the mean with the better complexity time?
There is no possible way to calculate a mean of N independent items with time complexity less than O(n): since you have to visit every item at least once to calculate the total.
If you want to beat O(n) complexity, then you will need to do something special, e.g.:
Use pre-computed sums for sub-lists
Exploit known dependencies in the data (e.g. certain elements being equal)
Of course, complexity does not always equate directly to speed. If you want to do it fast then there are plenty of other techniques (using concurrency or parallelism for example). But they will still be O(n) in complexity terms.
You approach it by divide and conquer. For that you can use ExecutorService
Depending on how you read in your lists/arrays, you could add together the floats while reading them into memory. Dividing the value by the size of the arrays is cheap. Therefore you don't need to process the values twice.
If you use a Collection class for storing the values, you could extend e.g. ArrayList and override the add() method to update a sum field and provide a getMean() method.
No, there is no better way than including every element in the calculation (which is O(first Array size + ... + last Array size)), at least not for the problem as stated. If the lists were to have some special properties or you want to recalculate the mean repeatedly after adding, removing or changing elements or lists, it would be a different story.
An informal proof:
Assume you managed to calculate the mean by skipping one element. Logically you'd get to the same mean by changing the skipped element to any other value (since we skipped it, its value doesn't matter). But, in this case, the mean should be different. Thus you must use every element in the calculation.