I'm looking for the most efficient way to determine whether a specific value exists in a small (16 element) array of integers in Java. The array is unsorted.
Options:
A boring but reliable for loop
Sort the array then Arrays.binarySearch(arr, targetVal)
List.contains method - example Arrays.asList(arr).contains(targetVal)
Something else.
Option 3 must have some overhead in "converting" to a List but I could use a List throughout rather than an array if that would be better overall. I've no feel for how List performs speed wise.
Based on condition that the array is unsorted any search on it will have complexity O(n).
You can try use your second assumption. In that case you will have O(n*log(n)) + O(log(n))
But if you have such small array and you want to search only once better to use a simple loop. Because it hard to predict what time will be elapsed for conversion to List or what type of sorting algorithm will you use and etc.
Just a loop will be a good choice
FYI: Stream will not be efficient at your case.
Related
I've been trying to figure out the answer to this problem without success maybe you could lead me a little bit:
We change the merge sort so that when you already sorted the array it stops and returning the array without calling to another 2 recursion calls.
for example lets run the algorithm on an array that each number in the array appears exactly n/log(n) times, (so that the array contains exactly log(n) different numbers ) what will be the running time complexity now?
"We change the merge sort so that when you already sorted the array it stops and returning the array without calling to another 2 recursion calls."
That's how normal merge sort works. After it sorts an array (or a section of the array), it does not call any more recursion calls, it just returns the sorted array. The recursion is called in order to sort the section of the array in the first place.
Perhaps you wanted to say "Before we recursively sort the 2 halves and merge them, we check if the array is already sorted". That would be useless with arrays with different numbers, as there would be an extremely low chance (1/n!) that the array would be sorted.
With your example it is more interesting, however if the array has only log(n) different numbers I would recommend ordering the unique values and creating a hashmap from value to index, which is fast on only log(n) values and then you can sort in linear time with bucket sort for example.
Indeed you can try and improve mergesort efficiency for sorted arrays by checking if the sorted subarrays are already in the proper order and skip the merge phase. This can be done efficiently by comparing the last element A of the left subarray for the first element B of the right subarray. If A <= B, merging is not needed.
This trick does not increase the complexity as it adds a single test to every merge phase, but it does not remove any of the recursive calls as it requires both subarrays to be sorted already. Conversely, it does reduce the complexity to linear if the array is sorted.
Another approach is the check if the array is already sorted before splitting and recursing. This adds many more tests in the general case but does not increase the complexity either as this number of tests is bounded by N log(N) as well. It is on average more expensive for unsorted arrays (more extra comparisons), but more efficient on sorted arrays (same number of tests, but no recursion).
You can try benchmarking both approaches on a variety of test cases and array sizes to measure the impact.
Suppose that I have a collection of 50 million different strings in a Java ArrayList. Let foo be a set of 40 million arbitrarily chosen (but fixed) strings from the previous collection. I want to know the index of every string in foo in the ArrayList.
An obvious way to do this would be to iterate through the whole ArrayList until we found a match for the first string in foo, then for the second one and so on. However, this solution would take an extremely long time (considering also that 50 million was an arbitrary large number that I picked for the example, the collection could be in the order of hundreds of millions or even billions but this is given from the beginning and remains constant).
I thought then of using a Hashtable of fixed size 50 million in order to determine the index of a given string in foo using someStringInFoo.hashCode(). However, from my understanding of Java's Hashtable, it seems that this will fail if there are collisions as calling hashCode() will produce the same index for two different strings.
Lastly, I thought about first sorting the ArrayList with the sort(List<T> list) in Java's Collections and then using binarySearch(List<? extends T> list,T key,Comparator<? super T> c) to obtain the index of the term. Is there a more efficient solution than this or is this as good as it gets?
You need additional data structure that is optimized for searching strings. It will map string to it's index. The idea is that you iterate your original list populating your data structure and then iterate your set, performing searches in that data structure.
What structure should you choose?
There are three options worth considering:
Java's HashMap
TRIE
Java's IdentityHashMap
The first option is simple to implement but provides not the best possible performance. But still, it's population time O(N * R) is better than sorting the list, which is O(R * N * log N). Searching time is better then in sorted String list (amortized O(R) compared to O(R log N).
Where R is the average length of your strings.
The second option is always good for maps of strings, providing guaranteed population time for your case of O(R * N) and guaranteed worst-case searching time of O(R). The only disadvantage of it is that there is no out-of-box implementation in Java standard libraries.
The third option is a bit tricky and suitable only for your case. In order to make it work you need to ensure that strings from the first list are literally used in second list (are the same objects). Using IdentityHashMap eliminates String's equals cost (the R above), as IdentityHashMap compares strings by address, taking only O(1). Population cost will be amortized O(N) and search cost amortized O(1). So this solution provides the best performance and out-of-box implementation. However please note that this solution will work only if there are no duplicates in the original list.
If you have any questions please let me know.
You can use a Java Hashtable with no problems. According to the Java Documentation "in the case of a "hash collision", a single bucket stores multiple entries, which must be searched sequentially."
I think you have a misconception on how hash tables work. Hash collisions do NOT ruin the implementation. A hash table is simply an array of linked-lists. Each key goes through a hash function to determine the index in the array which the element will be placed. If a hash collision occurs, the element will be placed at the end of the linked-list at the index in the hash-table array. See link below for diagram.
More specifically, suppose I have an array with duplicates:
{3,2,3,4,2,2,1,4}
I want to have a data structure that supports search and remove the first occurrence of some value faster than O(n), say if the value is 4, then it becomes:
{3,2,3,2,2,1,4}
I also need to iterate the list from head according to the same order. Other operations like get(index) or insert are not needed.
You can use O(n) time to record the original data(say it's an int[]) in your data structure, I just need the later search and remove faster than O(n).
"Search and remove" is considered as ONE operation as shown above.
If I have to make it myself, I would use a LinkedList to store the data, and HashMap to map every key to a list of all occurrence of nodes together with their previous and next ones.
Is it a right approach? Are there any better choices already there in Java?
The data structure you describe, essentially a hybrid linked list and map, I think is the most efficient way of handling your stated problem. You'll have to keep track of the nodes yourself, since Java's LinkedList doesn't provide access to the actual nodes. The AbstractSequentialList may be helpful here.
The index structure you'll need is a map from an element value to the appearances of that element in the list. I recommend a hash table from hashCode % modulus to a linked list of (value, list of main-list nodes).
Note that this approach is still O(n) in the worst case, when you have universal hash collisions; this applies whether you use open or closed hashing. In the average case it should be something closer to O(ln(n)), but I'm not prepared to prove that.
Consider also whether the overhead of keeping track of all of this is really worth the gains. Unless you've actually profiled running code and determined that a LinkedList is causing problems because remove is O(n), stick with that until you do.
Since your requirement is that the first occurrence of the element should be removed and the remaining occurrences retained, there would be no way to do it faster than O(n) as you would definitely have to move through to the end of the list to find out if there is another occurrence. There is no standard api from Oracle in the java package that does this.
I have about 200 lists of the kind (index , float) and I want to calculate the mean between them, I know the way with the complexity time of O(first Array size + ... + last Array size) is there any solution to calculate the mean with the better complexity time?
There is no possible way to calculate a mean of N independent items with time complexity less than O(n): since you have to visit every item at least once to calculate the total.
If you want to beat O(n) complexity, then you will need to do something special, e.g.:
Use pre-computed sums for sub-lists
Exploit known dependencies in the data (e.g. certain elements being equal)
Of course, complexity does not always equate directly to speed. If you want to do it fast then there are plenty of other techniques (using concurrency or parallelism for example). But they will still be O(n) in complexity terms.
You approach it by divide and conquer. For that you can use ExecutorService
Depending on how you read in your lists/arrays, you could add together the floats while reading them into memory. Dividing the value by the size of the arrays is cheap. Therefore you don't need to process the values twice.
If you use a Collection class for storing the values, you could extend e.g. ArrayList and override the add() method to update a sum field and provide a getMean() method.
No, there is no better way than including every element in the calculation (which is O(first Array size + ... + last Array size)), at least not for the problem as stated. If the lists were to have some special properties or you want to recalculate the mean repeatedly after adding, removing or changing elements or lists, it would be a different story.
An informal proof:
Assume you managed to calculate the mean by skipping one element. Logically you'd get to the same mean by changing the skipped element to any other value (since we skipped it, its value doesn't matter). But, in this case, the mean should be different. Thus you must use every element in the calculation.
how can I optimize the following:
final String[] longStringArray = {"1","2","3".....,"9999999"};
String searchingFor = "9999998"
for(String s : longStringArray)
{
if(searchingFor.equals(s))
{
//After 9999998 iterations finally found it
// Do the rest of stuff here (not relevant to the string/array)
}
}
NOTE: The longStringArray is only searched once per runtime & is not sorted & is different every other time I run the program.
Im sure there is a way to improve the worst case performance here, but I cant seem to find it...
P.S. Also would appreciate a solution, where string searchingFor does not exist in the array longStringArray.
Thank you.
Well, if you have to use an array, and you don't know if it's sorted, and you're only going to do one lookup, it's always going to be an O(N) operation. There's nothing you can do about that, because any optimization step would be at least O(N) to start with - e.g. populating a set or sorting the array.
Other options though:
If the array is sorted, you could perform a binary search. This will turn each lookup into an O(log N) operation.
If you're going to do more than one search, consider using a HashSet<String>. This will turn each lookup into an O(1) operation (assuming few collisions).
import org.apache.commons.lang.ArrayUtils;
ArrayUtils.indexOf(array, string);
ArrayUtils documentation
You can create a second array with the hash codes of the string and binary search on that.
You will have to sort the hash array and move the elements of the original array accordingly. This way you will end up with extremely fast searching capabilities but it's going to be kept ordered, so inserting new elements takes resources.
The most optimal would be implementing a binary tree or a B-tree, if you have really so much data and you have to handle inserts it's worth it.
Arrays.asList(longStringArray).contains(searchingFor)