I am trying to compare 2 lists to each other. Both lists have tens of thousands of entries.
My idea so far has been to use 2 ArrayLists and comparing them element by element. However I have been told that comparing too much can corrupt eclipse. No idea if this is true. Though better safe than sorry.
If you know any tips on comparing tens of thousands of Strings, please let me know. Thanks.
You have to compare each individual elements in both arrays try sorting the array and then using a for loop to run through it
What does mean "corrupt eclipse"?
However no, if you have two arraylist and you want to know if they are equal, you have to compare all the variables unless you find difference or end of a list. The complexity is O(n) - linear - which is the best you can get without some pre processing (which itself would be O(n) in best case).
If you are comparing 2 numbers, the expected output can be -
-> first greater than second
-> first less than second
-> first equal to second
But, what do you mean by comparing 2 lists? Are you planning to compare the length of both the lists? If so, you get the size of both the lists and compare them.
If you want to compare every element in the list with the every other element, it would mean, you are trying to sort the list. If you are planning to do the same thing with 2 lists, it might mean- you are trying to merge-sort both the lists. Meaning, you are creating 1 list containing all the elements of both the lists in sorted order.
This following video can help you for understanding merge-sort.
https://www.youtube.com/watch?v=EeQ8pwjQxTM
This code would have an average time complexity of O(n log n). And for the lists with tens of thousands of elements, the algorithm would have considerable time complexity, but eclipse wouldn't corrupt. Worst case, if your code is not written properly, memory leaks can cause a problem in the JVM.
I hope my answer helps you. I might help you in a better way if your question is clearer and more specific.
Related
I've been trying to figure out the answer to this problem without success maybe you could lead me a little bit:
We change the merge sort so that when you already sorted the array it stops and returning the array without calling to another 2 recursion calls.
for example lets run the algorithm on an array that each number in the array appears exactly n/log(n) times, (so that the array contains exactly log(n) different numbers ) what will be the running time complexity now?
"We change the merge sort so that when you already sorted the array it stops and returning the array without calling to another 2 recursion calls."
That's how normal merge sort works. After it sorts an array (or a section of the array), it does not call any more recursion calls, it just returns the sorted array. The recursion is called in order to sort the section of the array in the first place.
Perhaps you wanted to say "Before we recursively sort the 2 halves and merge them, we check if the array is already sorted". That would be useless with arrays with different numbers, as there would be an extremely low chance (1/n!) that the array would be sorted.
With your example it is more interesting, however if the array has only log(n) different numbers I would recommend ordering the unique values and creating a hashmap from value to index, which is fast on only log(n) values and then you can sort in linear time with bucket sort for example.
Indeed you can try and improve mergesort efficiency for sorted arrays by checking if the sorted subarrays are already in the proper order and skip the merge phase. This can be done efficiently by comparing the last element A of the left subarray for the first element B of the right subarray. If A <= B, merging is not needed.
This trick does not increase the complexity as it adds a single test to every merge phase, but it does not remove any of the recursive calls as it requires both subarrays to be sorted already. Conversely, it does reduce the complexity to linear if the array is sorted.
Another approach is the check if the array is already sorted before splitting and recursing. This adds many more tests in the general case but does not increase the complexity either as this number of tests is bounded by N log(N) as well. It is on average more expensive for unsorted arrays (more extra comparisons), but more efficient on sorted arrays (same number of tests, but no recursion).
You can try benchmarking both approaches on a variety of test cases and array sizes to measure the impact.
Recently below questions were asked in an interview
You are given an array of integer with all elements repeated twice except one element which occurs only once, you need to find the unique element with O(nlogn)time complexity. Suppose array is {2,47,2,36,3,47,36} the output should be 3 here. I told we can perform merge sort(as it takes(nlogn)) after that we can check next element, but he said it will take O(nlogn)+O(n). I also told we can use HashMap to keep count of elements but again he said no as we have to iterate over hashmap again to get results. After some research, I came to know that using xor operation will give output in O(n). Is there any better solution other than sorting which can give the answer in O(nlogn) time?
As we use smartphones we can open many apps at a time. when we look at what all apps are open currently we see a list where the recently opened app is at the front and we can remove or close an app from anywhere on the list. There is some Collection available in java which can perform all these tasks in a very efficient way. I told we can use LinkedList or LinkedHashMap but he was not convinced. What could be the best Collection to use?
Firstly, if the interviewer used Big-O notation and expected a O(n log n) solution, there's nothing wrong with your answer. We know that O(x + y) = O(max(x, y)). Therefore, although your algorithm is O(n log n + n), it's okay if we just call O(n log n). However, it's possible to find the element that appears once in a sorted array can be achieved in O(log n) using binary search. As a hint, exploit odd and even indices while performing search. Also, if the interviewer expected a O(n log n) solution, the objection for traversing is absurd. The hash map solution is already O(n), and if there's a problem with this, it's the requirement of extra space. For this reason, the best one is to use XOR as you mentioned. There're still some more O(n) solutions but they're not better than the XOR solution.
To me, LinkedList is proper to use for this task as well. We want to remove from any location and also want to use some stack operations (push, pop, peek). A custom stack can be built from a LinkedList.
I have about 200 lists of the kind (index , float) and I want to calculate the mean between them, I know the way with the complexity time of O(first Array size + ... + last Array size) is there any solution to calculate the mean with the better complexity time?
There is no possible way to calculate a mean of N independent items with time complexity less than O(n): since you have to visit every item at least once to calculate the total.
If you want to beat O(n) complexity, then you will need to do something special, e.g.:
Use pre-computed sums for sub-lists
Exploit known dependencies in the data (e.g. certain elements being equal)
Of course, complexity does not always equate directly to speed. If you want to do it fast then there are plenty of other techniques (using concurrency or parallelism for example). But they will still be O(n) in complexity terms.
You approach it by divide and conquer. For that you can use ExecutorService
Depending on how you read in your lists/arrays, you could add together the floats while reading them into memory. Dividing the value by the size of the arrays is cheap. Therefore you don't need to process the values twice.
If you use a Collection class for storing the values, you could extend e.g. ArrayList and override the add() method to update a sum field and provide a getMean() method.
No, there is no better way than including every element in the calculation (which is O(first Array size + ... + last Array size)), at least not for the problem as stated. If the lists were to have some special properties or you want to recalculate the mean repeatedly after adding, removing or changing elements or lists, it would be a different story.
An informal proof:
Assume you managed to calculate the mean by skipping one element. Logically you'd get to the same mean by changing the skipped element to any other value (since we skipped it, its value doesn't matter). But, in this case, the mean should be different. Thus you must use every element in the calculation.
I got requirements-
1. Have random values in a List/Array and I need to find 3 max values .
2. I have a pool of values and each time this pool is getting updated may be in every 5 seconds, Now every time after the update , I need to find the 3 max Values from the list pool.
I thought of using Math.max thrice on the list but I dont think it as
a very optimized approach.
> Won't any sorting mechanism be costly as I am bothered about only top
3 Max Values , why to sort all these
Please suggest the best way to do it in JAVA
Sort the list, get the 3 max values. If you don't want the expense of the sort, iterate and maintain the n largest values.
Maintain the pool is a sorted collection.
Update: FYI Guava has an Ordering class with a greatestOf method to get the n max elements in a collection. You might want to check out the implementation.
Ordering.greatestOf
Traverse the list once, keeping an ordered array of three largest elements seen so far. This is trivial to update whenever you see a new element, and instantly gives you the answer you're looking for.
A priority queue should be the data structure you need in this case.
First, it would be wise to never say again, "I dont think it as a very optimized approach." You will not know which part of your code is slowing you down until you put a profiler on it.
Second, the easiest way to do what you're trying to do -- and what will be most clear to someone later if they are trying to see what your code does -- is to use Collections.sort() and pick off the last three elements. Then anyone who sees the code will know, "oh, this code takes the three largest elements." There is so much value in clear code that it will likely outweigh any optimization that you might have done. It will also keep you from writing bugs, like giving a natural meaning to what happens when someone puts the same number into the list twice, or giving a useful error message when there are only two elements in the list.
Third, if you really get data which is so large that O(n log n) operations is too slow, you should rewrite the data structure which holds the data in the first place -- java.util.NavigableSet for example offers a .descendingIterator() method which you can probe for its first three elements, those would be the three maximum numbers. If you really want, a Heap data structure can be used, and you can pull off the top 3 elements with something like one comparison, at the cost of making adding an O(log n) procedure.
I have a variable number of ArrayList's that I need to find the intersection of. A realistic cap on the number of sets of strings is probably around 35 but could be more. I don't want any code, just ideas on what could be efficient. I have an implementation that I'm about to start coding but want to hear some other ideas.
Currently, just thinking about my solution, it looks like I should have an asymptotic run-time of Θ(n2).
Thanks for any help!
tshred
Edit: To clarify, I really just want to know is there a faster way to do it. Faster than Θ(n2).
Set.retainAll() is how you find the intersection of two sets. If you use HashSet, then converting your ArrayLists to Sets and using retainAll() in a loop over all of them is actually O(n).
The accepted answer is just fine; as an update : since Java 8 there is a slightly more efficient way to find the intersection of two Sets.
Set<String> intersection = set1.stream()
.filter(set2::contains)
.collect(Collectors.toSet());
The reason it is slightly more efficient is because the original approach had to add elements of set1 it then had to remove again if they weren't in set2. This approach only adds to the result set what needs to be in there.
Strictly speaking you could do this pre Java 8 as well, but without Streams the code would have been quite a bit more laborious.
If both sets differ considerably in size, you would prefer streaming over the smaller one.
There is also the static method Sets.intersection(set1, set2) in Google Guava that returns an unmodifiable view of the intersection of two sets.
One more idea - if your arrays/sets are different sizes, it makes sense to begin with the smallest.
The best option would be to use HashSet to store the contents of these lists instead of ArrayList. If you can do that, you can create a temporary HashSet to which you add the elements to be intersected (use the putAll(..) method). Do tempSet.retainAll(storedSet) and tempSet will contain the intersection.
Sort them (n lg n) and then do binary searches (lg n).
You can use single HashSet. It's add() method returns false when the object is alredy in set. adding objects from the lists and marking counts of false return values will give you union in the set + data for histogram (and the objects that have count+1 equal to list count are your intersection). If you throw the counts to TreeSet, you can detect empty intersection early.
In case that is required the state if 2 set has intersection, I use the next snippet on Java 8+ versions code:
set1.stream().anyMatch(set2::contains)