Trie data structure space usage in Java

Trie data structure space usage in Java - java

I just want to double check the total space that a Trie data structure could have in the worst case. I thought it would be O(N*K) where N is total number of nodes and K is the size of the alphabet (points to other tries), but people keep telling me it's O(K^L) where where K is the size of the alphabet and L is the average word length, but do those null pointers use up memory space in Java ? for example, if one of the node only has lets say 3 branches/points out of total size K. Does it use K space ? or just 3 ? The following is Trie implementation in Java
class Trie {
private Trie [] tries;
public Trie () {
// A size 256 array of Trie, and they are all null
this.tries = new Trie[256]; // K = 256;
}
}

If the memory footprint of a single node is K references, and the trie has N nodes, then obviously its space complexity is O(N*K). This accounts for the fact that null pointers do occupy their space. Actually whether an array entry is null or any other value doesn't change anything in terms of memory consumption.
O(K^L) is a completely different measure because it uses different parameters. Basically K^L is the estimate on the number of nodes in a densely populated trie, whereas in O(N*K) the number of nodes is explicitly given.

I'd like to give a little more details on Marko's answer.
The memory consumed by each node of the trie is the same, being null or not. An array store only pointers, and it has the total space since initialization. Each node though will have it's own memory, but that is an implementation detail and we are talking about asymptotic analysis, so we don't consider the memory occupied by a node's implementation.
O(N*K) is the number of nodes in a full trie (for each node N there are K children). That is correct, but you are considering the number of nodes and you don't know that upfront. If you know that, you can add up the memory used for each node (implementation detail) and you will be calculating the exact amount of memory used by your trie. The Big-O notation may not even make sense in this case (?).
What you can know is L (the average length of keys) and K (the size of the alphabet), so you use these to analyze the complexity. If you do the math, you will find out K^L actually accounts only for the last level of the trie (take K=2 and L=3, that would give a binary tree of height 4, and 2^3 = 8 nodes at the last level and 15 nodes total). The last level doesn't give the total number of nodes in the trie, but we are talking about asymptotic analysis and only the significant bits matter. So you have O(K^L).

Related

Time complexity of travel a trie

Would it be O(26n) where 26 is the number of letters of the alphabet and n is the number of levels of the trie? For example this is the code to print a trie:
public void print()
{
for(int i = 0; i < 26; i++)
{
if(this.next[i] != null)
{
this.next[i].print();
}
}
if(this.word != null)
{
System.out.println(this.word.getWord());
}
}
So watching this code makes me think that my aproximation of the time complexity is correct in the worst of the cases that would be the 26 nodes full for n levels.

Would it be O(26n) where 26 is the number of letters of the alphabet and n is the number of levels of the trie?
No. Each node in the trie must be visited, and O(1) work performed for each one (ignoring the work attributable to processing the children, which is accounted separately). The number of children does not matter on a per-node basis, as long as it is bounded by a constant (e.g. 26).
How many nodes are there altogether? Generally more than the number of words stored in the trie, and possibly a lot more. For a naively-implemented, perfectly balanced, complete trie with n levels below the root, each level has 26 times as many nodes as the previous, and so the total number of nodes is 1 + 26 + 262 + ... + 26n. That is O(26n+1) == O(26n) == O(2n), or "exponential" in the number of levels, which also corresponds to the length of the longest word stored within.
But one is more likely to be interested in measuring the complexity in terms of the number of words stored in the trie. With a careful implementation, it is possible to have nodes only for those words and for each maximal initial substring that is common to two or more of those words. In that event, every node has either zero children or at least two, so for w total words, the total number of nodes is bounded by w + w/2 + w/4 + ..., which converges to 2w. Therefore, a traversal of a trie with those structural properties costs O(2w) == O(w).
Moreover, with a little more thought, it is possible to conclude that the particular structural property I described is not really necessary to have O(w) traversal.

I am not familiar with a trie but the big O notation is mainly to depict approximately how quickly the running time or resource consumption grows relative to the input size. The way I think of it is just rereferring to general shape of the curve on the graph rather than exact points on the graph. A O(1) looks like a flat line, while a O(n) looks like a line at a 45 deg angle, etc.
source: https://medium.com/dataseries/how-to-calculate-time-complexity-with-big-o-notation-9afe33aa4c46
Now for the algorithm in the question. I am not familiar with a trie, but at first glace I would say it is O(1) (constant time), because the number of iterations of the loop is constant (always 26). However, in the loop it has this.next[i].print() which could completely change the answer depending on its complexity, and uncovers a important question we need to know: what is n?.
I am going to assume that the this.next[i] is of the same type as this, making the this.next[i].print() kind of a recursive call. In such a scenario the time it takes to finish executing will all depend on the number of instances that will have to be iterated though (visited). This algorithm resembles Depth First Search but does not safe guard against infinite recursion. This may be based on some additional information known about the next[i] instances (nodes) such as an instance is only ever referenced by at most 1 other instance. In this case the runtime complexity would be on order of O(n) where n is the number of instances or nodes.
... assuming that the this.word.getWord() runs in constant time as well. If it depends on some other word input, the runtime may as well be O(n * w) where n is number of nodes and w is the size of the words.

In what order does a minheap sort?

I've been reading various definitions on minHeap and maxHeap. I stumbled upon statements which say:
minHeap is used to sort in descending order.
maxHeap is used to sort in ascending order.
Statements taken from the "Note" in https://www.geeksforgeeks.org/heap-sort-for-decreasing-order-using-min-heap/ .
But when I implement minHeap using PriorityQueue<Integer> in Java with default comparator, and poll() it, I get the minimum element. Why is that?
Thanks to anybody who's trying to help :).

The explanation in the blog is correct
While having a close look at the heapSort() function, it has smartly made use of min heap. The smallest element of the array gets replaced with the last element and size of heap is reduced by 1 to again heapify() it.
arr[0] -> represents the smallest element.
In every iteration, for i from n-1 to 0, the arr[0] is swapped with the arr[i] and heap is again heapified with size of 1 smaller than previous iteration.

min-heap and max-heap don't sort. You can use a min-heap or max-heap to sort, but in standard use, heaps aren't sorted. Instead, they arranged in what's called heap order. This arrangement makes it efficient to add items, and to remove the smallest (or largest) while keeping the data in the proper order.
For example, here's an illustration of a min-heap:
1
3 2
4 7 6 5
That follows the rule that no child is larger than its parent. The resulting array representation of that heap is [1,3,2,4,7,6,5]. Note that there are other valid heaps with those same numbers. For example, both of the below are valid, as well:
1 1
2 5 2 3
4 3 6 7 4 5 6 7
The corresponding array representations are [1,2,5,4,3,6,7] and [1,2,3,4,5,6,7].
Max-heap is similar, except the rule is that no child can be larger than its parent.
The Wikipedia article on binary heap explains this all very well.
Now, when you're talking about using heap sort, you build a heap and then repeatedly swap the root element with the last element in the array, reduce the count and then re-heapify. So you build the sorted array from back to front. If you use a min-heap, then the root (smallest value) will be at the end of the array.
So if you want to sort in descending order with heap sort, you use a min-heap.

The basic idea is to sort in place. Every time the algorithm polls from the heap, the heaps size shrinks by one. So one place at the end of the array is not part of the heap anymore. In this index the n-th smallest number is placed.
// One by one extract an element from heap
for (int i = n - 1; i >= 0; i--) {
// Move current root to end
swap(arr[0], arr[i]);
// call max heapify on the reduced heap
heapify(arr, i, 0);
}
So the explanation why your algorithm using a PriorityQueue behaves so oddly is that you use a separate array for output. You could either switch to a max-heap and stick to your approach or fill the output-array in reverse. Both should produce the same behavior.

To store million keys of million length which will be better red black tree or radix tree?

I want to store multiple keys(String) of million length with their objects associated with it. Such that I have to insert in the data structure(rbtree or radix tree) very frequently and have to search quite a low number of time as compared to insert. Any reccommendations will be appreciated. Thank you.

Since insertion is your primary concern, then you should use a red-black tree because its insertion time complexity is logarithmically in the input size, that is O(k*log n) with log being base 2 logarithm, k is the size or length of each input and n is the amount of inputs. The radix tree's insert is linear in the size k of each input and in the amount n of inputs , that is O(k*n), which is worse than for red-black trees, unless many of the string keys share sufficiently long prefixes in order to transform n in a sub-logarithmic expression of n.

Find All Elements in an Array which appears more than N/K times, N is Array Size and k is a Number

Every where best algorithm given is this
1) Create a temporary array of size (k-1) to store elements and their counts (The output elements are going to be among these k-1 elements). Following is structure of temporary array elements.
2) Traverse through the input array and update temp[] (add/remove an element or increase/decrease count) for every traversed element. The array temp[] stores potential (k-1) candidates at every step. This step takes O(nk) time.
3) Iterate through final (k-1) potential candidates (stored in temp[]). or every element, check if it actually has count more than n/k. This step takes O(nk) time.
Resources i referred are :-
1) http://www.geeksforgeeks.org/given-an-array-of-of-size-n-finds-all-the-elements-that-appear-more-than-nk-times/
2) http://algorithms.tutorialhorizon.com/find-all-elements-in-an-array-which-appears-more-than-nk-times-n-is-array-size-and-k-is-a-number/
**To me better algorithm looks like below which is O(n) **
1)Start iterating through array.
2)For each element check if element exist in hash map . If yes increase counter otherwise add it with counter 1
3)Also after addition of each element check whether count is greater than n/k times. If yes print
Here memory complexity will be O(N) which is bad than O(K-1) but time complexity is much better here which is O(n) compared to O(nk) in previuos algo.
so is n't second algo better considering time complexity and simplicity/readability or answer is Depends on kind depending upon where we want to compromise i.e. either on
time or memory ?

What data structure using O(n log n) storage with O(log n) query time should I use for Range Minimum Queries?

Suppose that we are given a sequence of n values x1, x2 ... xn, and seek to quickly answer repeated queries of the form: given i and j, find the smallest value in xi... xj
Design a data structure that uses O(n log n) space and answers queries in O(log n) time.
I know the solution of data structure with O(n) space and O(log n) time , but I need an answer
that uses O( n log n) space not less .
any suggestions ?
solution for O(n) space :
O(n) space : For input of a1,a2,a3,...an , construct a node that contains minimum of (a1,..,ak) and minimum of (ak+1,..,an) where k = n/2. Recursively construct the rest of the tree. Now, if you want to find the minimum between ai and aj: Identify the lowest common ancestor of i,j. Let it be k Start with i and keep moving until you hit k. AT every iteration check if the child node was left node. If yes, then compare the right subtree's min and update current min accordingly. Similarly, for j, check if it is right node.... At node k compare values returned by each subtree and return the min

First of all, O(n) is also O(n logn), so technically any solution that's O(n) is automatically also O(n logn).
What you seem to be asking is a solution that uses Θ(n logn) memory (note the theta). I have to say I think this is a slightly odd requirement given that you claim you already have a superior Θ(n) solution.
In any case, you can trivially transform your Θ(n) solution into a Θ(n logn) one by making log(n) copies of your data structure.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Trie data structure space usage in Java - java

Related

Time complexity of travel a trie

In what order does a minheap sort?

To store million keys of million length which will be better red black tree or radix tree?

Find All Elements in an Array which appears more than N/K times, N is Array Size and k is a Number

What data structure using O(n log n) storage with O(log n) query time should I use for Range Minimum Queries?

Categories

Resources