Given a string, seperted by a single space, I need to transfer each word in the String to a Node in a linked list, and keep the list sorted lexically (like in a dictionary).
The first step I did is to move through the String, and put every word in a seperate Node. Now, I'm having a hard time sorting the list - it has to be done in the most efficient way.
Merge-sort is nlogn. Merge-sort would be the best choice here?
Generally if you had a list and wanted to sort it merge sort is a good solution. But in your case you can make it better.
You have a string separated by spaces and you break it and put it in list's nodes. Then you want to sort the list.
You can do better by combining both steps.
1) Have a linked list with head and tail and pointers to previous node.
2) As you extract a word from the sentence store the word in the list in inserted order. I mean you start from the tail or head of the list depending on if it is larger or smaller than these elements and go forward until you reach an element larger/smaller than the current one. Insert it at that location. You just update the pointers.
Just use the built-in Collections.sort, which is a mergesort implementation. More specifically:
This implementation is a stable, adaptive, iterative mergesort that requires far fewer than n lg(n) comparisons when the input array is partially sorted, while offering the performance of a traditional mergesort when the input array is randomly ordered. If the input array is nearly sorted, the implementation requires approximately n comparisons. Temporary storage requirements vary from a small constant for nearly sorted input arrays to n/2 object references for randomly ordered input arrays.
Related
Given a changed merge sort algorithm such that if the array is already sorted the algorithm will return the array instead of making 2 more recursive calls.
Assuming we run the new algorithm on an array where each value in it appears exactly n/log(n) times. (And for that the array contains log(n) different values).
What is the time complexity of that algorithm?
If you suspect the array to have very few different values, scanning the array to extract these values, sort them and count them will take substantially less time than performing a full merge sort on the array:
If you use a hash table, selecting the values will take O(N) time, producing a sample array of size log(N).
sorting this sample array should take O(log(N).log(log(N)), negligible compared to the scan phase.
enumerating the sample array to generate copies into the original array also has linear time complexity O(N).
Hence the time complexity could be reduced to O(N).
Note however that:
using a hash table might not be feasible to construct the sample array. If instead you construct a sorted list, the complexity jumps back to O(N.log(N)) because of the linear lookup into the sample array for each element.
generating copies of the elements might not be adequate if the original array's elements have identical keys but different contents. In this case, you would scan the original array and lookup the element's key in the sample array to determine where to store the element in the resulting array, again O(N.log(N)) time complexity if the sample array is a list, and O(N.log(log(N))) if it is an array and you use binary search.
As a conclusion, the complexity can be reduced efficiently in special cases, but it is tricky in the general case.
I've been trying to figure out the answer to this problem without success maybe you could lead me a little bit:
We change the merge sort so that when you already sorted the array it stops and returning the array without calling to another 2 recursion calls.
for example lets run the algorithm on an array that each number in the array appears exactly n/log(n) times, (so that the array contains exactly log(n) different numbers ) what will be the running time complexity now?
"We change the merge sort so that when you already sorted the array it stops and returning the array without calling to another 2 recursion calls."
That's how normal merge sort works. After it sorts an array (or a section of the array), it does not call any more recursion calls, it just returns the sorted array. The recursion is called in order to sort the section of the array in the first place.
Perhaps you wanted to say "Before we recursively sort the 2 halves and merge them, we check if the array is already sorted". That would be useless with arrays with different numbers, as there would be an extremely low chance (1/n!) that the array would be sorted.
With your example it is more interesting, however if the array has only log(n) different numbers I would recommend ordering the unique values and creating a hashmap from value to index, which is fast on only log(n) values and then you can sort in linear time with bucket sort for example.
Indeed you can try and improve mergesort efficiency for sorted arrays by checking if the sorted subarrays are already in the proper order and skip the merge phase. This can be done efficiently by comparing the last element A of the left subarray for the first element B of the right subarray. If A <= B, merging is not needed.
This trick does not increase the complexity as it adds a single test to every merge phase, but it does not remove any of the recursive calls as it requires both subarrays to be sorted already. Conversely, it does reduce the complexity to linear if the array is sorted.
Another approach is the check if the array is already sorted before splitting and recursing. This adds many more tests in the general case but does not increase the complexity either as this number of tests is bounded by N log(N) as well. It is on average more expensive for unsorted arrays (more extra comparisons), but more efficient on sorted arrays (same number of tests, but no recursion).
You can try benchmarking both approaches on a variety of test cases and array sizes to measure the impact.
I have a text file containing a sorted list of words being my dictionary.
I would like to use a TreeMap in order to have log(n) as average cost when I have to see if a words belongs to the dictionary or not (that is containsKey).
I have read of the Black-Read tree being behind the scenes of the TreeMap, so it is self balancing.
My question is: which is the best way to feed the TreeMap with the list of words?
I mean: feeding it with a sorted list should be the worst case scenario for a binary tree, because it have to balance almost every other word, haven't it?
The list of words can vary from 7K to 150K in number.
TreeMap hides its implementation details, as good OO design prescribes, so to really optimize for your use case will probably be hard.
However, if it is an option to read all items into an array/list before adding them to your TreeMap, you can add them "inside out": the middle element of the list will become the root, so add it first, and then recursively add the first half and second half in the same manner. In fact, this is the strategy that the TreeMap(SortedMap) constructor follows.
If it is not an option to read all items, I think you have no other option than to simply put your entries to the map consecutively, or write your own tree implementation so that you have more control over how to generate it. If you at least know the number of items beforehand, you should be able to generate a balanced tree without ever having to rebalance.
If you do not need the extra features of a TreeMap, you might also consider using a HashMap, which (given a good hash function for your keys) even has O(1) access.
You have n sorted linked lists, each of size n. The linked lists references
are stored in an array. What is an efficient algorithm to merge the n linked
lists into a single sorted linked list?
Since they are all sorted:
Incorporate a loop
Check the first node of all the sorted linked lists and sort them by comparing to each other.
Proceed to next node and repeat until null is hit.
Is this the most efficient way of doing this?
Just link them all together (or dump them into a single list) and use a general sort. That will give you nlog(n) performance. Your way is n^2.
I often* find myself in need of a data structure which has the following properties:
can be initialized with an array of n objects in O(n).
one can obtain a random element in O(1), after this operation the picked
element is removed from the structure.
(without replacement)
one can undo p 'picking without replacement' operations in O(p)
one can remove a specific object (eg by id) from the structure in O(log(n))
one can obtain an array of the objects currently in the structure in
O(n).
the complexity (or even possibility) of other actions (eg insert) does not matter. Besides the complexity it should also be efficient for small numbers of n.
Can anyone give me guidelines on implementing such a structure? I currently implemented a structure having all above properties, except the picking of the element takes O(d) with d the number of past picks (since I explicitly check whether it is 'not yet picked'). I can figure out structures allowing picking in O(1), but these have higher complexities on at least one of the other operations.
BTW:
note that O(1) above implies that the complexity is independent from #earlier picked elements and independent from total #elements.
*in monte carlo algorithms (iterative picks of p random elements from a 'set' of n elements).
HashMap has complexity O(1) both for insertion and removal.
You specify a lot of operation, but all of them are nothing else then insertion, removal and traversing:
can be initialized with an array of n objects in O(n).
n * O(1) insertion. HashMap is fine
one can obtain a random element in
O(1), after this operation the picked
element is removed from the structure.
(without replacement)
This is the only op that require O(n).
one can undo p 'picking without
replacement' operations in O(p)
it's an insertion operation: O(1).
one can remove a specific object (eg
by id) from the structure in O(log(n))
O(1).
one can obtain an array of the objects
currently in the structure in O(n).
you can traverse an HashMap in O(n)
EDIT:
example of picking up a random element in O(n):
HashMap map ....
int randomIntFromZeroToYouHashMapSize = ...
Collection collection = map.values();
Object[] values = collection.toArray();
values[randomIntFromZeroToYouHashMapSize];
Ok, same answer as 0verbose with a simple fix to get the O(1) random lookup. Create an array which stores the same n objects. Now, in the HashMap, store the pairs . For example, say your Objects (strings for simplicity) are:
{"abc" , "def", "ghi"}
Create an
List<String> array = ArrayList<String>("abc","def","ghi")
Create a HashMap map with the following values:
for (int i = 0; i < array.size(); i++)
{
map.put(array[i],i);
}
O(1) random lookup is easily achieved by picking any index in the array. The only complication that arises is when you delete an object. For that, do:
Find object in map. Get its array index. Lets call this index i (map.get(i)) - O(1)
Swap array[i] with array[size of array - 1] (the last element in the array). Reduce the size of the array by 1 (since there is one less number now) - O(1)
Update the index of the new object in position i of the array in map (map.put(array[i], i)) - O(1)
I apologize for the mix of java and cpp notation, hope this helps
Here's my analysis of using Collections.shuffle() on an ArrayList:
✔ can be initialized with an array of n objects in O(n).
Yes, although the cost is amortized unless n is known in advance.
✔ one can obtain a random element in O(1), after this operation the picked element is removed from the structure, without replacement.
Yes, choose the last element in the shuffled array; replace the array with a subList() of the remaining elements.
✔ one can undo p 'picking without replacement' operations in O(p).
Yes, append the element to the end of this list via add().
❍ one can remove a specific object (eg by id) from the structure in O(log(n)).
No, it looks like O(n).
✔ one can obtain an array of the objects currently in the structure in O(n).
Yes, using toArray() looks reasonable.
How about an array (or ArrayList) that's divided into "picked" and "unpicked"? You keep track of where the boundary is, and to pick, you generate a random index below the boundary, then (since you don't care about order), swap the item at that index with the last unpicked item, and decrement the boundary. To unpick, you just increment the boundary.
Update: Forgot about O(log(n)) removal. Not that hard, though, just a little memory-expensive, if you keep a HashMap of IDs to indices.
If you poke around on line you'll find various IndexedHashSet implementations that all work on more or less this principle -- an array or ArrayList plus a HashMap.
(I'd love to see a more elegant solution, though, if one exists.)
Update 2: Hmm... or does the actual removal become O(n) again, if you have to either recopy the arrays or shift them around?