The problem I have to solve is the 4D version of the 1D problem of stabbing queries: find which intervals a number belongs to. I am looking for a multi-dimensional implementation of segment trees. Ideally, it will be in Java and it will use fractional cascading.
Multi-dimensional implementations exist for kd-trees (k-NN searches) and range trees (given a bounding box, find all points in it) but for segment trees I've only found 1D implementations.
I'd be happy to consider other data structures with similar space/time complexity to address the same problem.
To expand on my comment, the binary-space-partitioning algorithm I have in mind is this.
Choose a coordinate x and a threshold t (random coordinate, median coordinate, etc.).
Allocate a new node and assign all of the intervals that intersect the half-plane x=t to it.
Recursively construct child nodes for (a) the intervals contained entirely within the lower half-space x<t and (b) the intervals contained entirely within the upper half-space x>t.
The stabbing query starts at the root, checks all of the intervals assigned to the current node, descends to the appropriate child, and repeats. It may be worthwhile to switch to brute force for small subtrees.
If too many intervals are getting stabbed by the half-plane x=t, you could try recursing on the (a) the intervals that intersect the lower half-space and (b) the intervals that intersect the upper half-space. This duplicates intervals, so the space requirement is no longer linear, and you probably have to switch over to brute force on collections of intervals for which subdivision proves unproductive.
Related
Would it be O(26n) where 26 is the number of letters of the alphabet and n is the number of levels of the trie? For example this is the code to print a trie:
public void print()
{
for(int i = 0; i < 26; i++)
{
if(this.next[i] != null)
{
this.next[i].print();
}
}
if(this.word != null)
{
System.out.println(this.word.getWord());
}
}
So watching this code makes me think that my aproximation of the time complexity is correct in the worst of the cases that would be the 26 nodes full for n levels.
Would it be O(26n) where 26 is the number of letters of the alphabet and n is the number of levels of the trie?
No. Each node in the trie must be visited, and O(1) work performed for each one (ignoring the work attributable to processing the children, which is accounted separately). The number of children does not matter on a per-node basis, as long as it is bounded by a constant (e.g. 26).
How many nodes are there altogether? Generally more than the number of words stored in the trie, and possibly a lot more. For a naively-implemented, perfectly balanced, complete trie with n levels below the root, each level has 26 times as many nodes as the previous, and so the total number of nodes is 1 + 26 + 262 + ... + 26n. That is O(26n+1) == O(26n) == O(2n), or "exponential" in the number of levels, which also corresponds to the length of the longest word stored within.
But one is more likely to be interested in measuring the complexity in terms of the number of words stored in the trie. With a careful implementation, it is possible to have nodes only for those words and for each maximal initial substring that is common to two or more of those words. In that event, every node has either zero children or at least two, so for w total words, the total number of nodes is bounded by w + w/2 + w/4 + ..., which converges to 2w. Therefore, a traversal of a trie with those structural properties costs O(2w) == O(w).
Moreover, with a little more thought, it is possible to conclude that the particular structural property I described is not really necessary to have O(w) traversal.
I am not familiar with a trie but the big O notation is mainly to depict approximately how quickly the running time or resource consumption grows relative to the input size. The way I think of it is just rereferring to general shape of the curve on the graph rather than exact points on the graph. A O(1) looks like a flat line, while a O(n) looks like a line at a 45 deg angle, etc.
source: https://medium.com/dataseries/how-to-calculate-time-complexity-with-big-o-notation-9afe33aa4c46
Now for the algorithm in the question. I am not familiar with a trie, but at first glace I would say it is O(1) (constant time), because the number of iterations of the loop is constant (always 26). However, in the loop it has this.next[i].print() which could completely change the answer depending on its complexity, and uncovers a important question we need to know: what is n?.
I am going to assume that the this.next[i] is of the same type as this, making the this.next[i].print() kind of a recursive call. In such a scenario the time it takes to finish executing will all depend on the number of instances that will have to be iterated though (visited). This algorithm resembles Depth First Search but does not safe guard against infinite recursion. This may be based on some additional information known about the next[i] instances (nodes) such as an instance is only ever referenced by at most 1 other instance. In this case the runtime complexity would be on order of O(n) where n is the number of instances or nodes.
... assuming that the this.word.getWord() runs in constant time as well. If it depends on some other word input, the runtime may as well be O(n * w) where n is number of nodes and w is the size of the words.
I have a sorted list of 2000 or fewer objects, each with a numerical value. I'm wondering how I can write (in Java) a way so split this list into sublists, each of roughly 200 objects (with fair leeway) such that the sum of the values of each sublist are roughly equal.
Even if the full list has fewer than 2000 objects, I still want the sublists to be roughly 200 objects each. Thank you!
Here is a quick and dirty greedy approach that should work well.
First, decide how many lists you will wind up with. Call that m.
Break your objects into groups of m, with the one possibly smaller group being the values closest to 0.
Order your groups by descending difference between the biggest and the smallest.
Assign your groups to your lists, with the largest object going into the group with the lowest total, the next largest the next lowest, and so on.
When you are done, you will have lists of the right size, with relatively small differences.
(You can do better than this with dynamic programming. But it will also be harder to write. How to constrain items with multiple randomly selected positions so that the average position for each is within a certain range may give you some ideas about how to do that.)
I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements
But how it is calculated to O(N^2)
I have read couple of articles still not able to understand it fully.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated
I understand worst case happens when the pivot is the smallest or the largest element. Then one of the partition is empty and we repeat the recursion for N-1 elements.
So, imagine that you repeatedly pick the worst pivot; i.e. in the N-1 case one partition is empty and you recurse with N-2 elements, then N-3, and so on until you get to 1.
The sum of N-1 + N-2 + ... + 1 is (N * (N - 1)) / 2. (Students typically learn this in high-school maths these days ...)
O(N(N-1)/2) is the same as O(N^2). You can deduce this from first principles from the mathematical definition of Big-O notation.
Similarly, best case is when the pivot is the median of the array and the left and right part are of the same size. But, then how the value O(NlogN) is calculated.
That is a bit more complicated.
Think of the problem as a tree:
At the top level, you split the problem into two equal-sized sub problems, and move N objects into their correct partitions.
At the 2nd level. you split the two sub-problems into four sub-sub-problems, and in 2 problems you move N/2 objects into their correct partitions, for a total of N objects moved.
At the bottom level you have N/2 sub-problems of size 2 which you (notionally) split into N problems of size 1, again copying N objects.
Clearly, at each level you move N objects. The height of the tree for a problem of size N is log2N. So ... there are N * log2N object moves; i.e. O(N * log2)
But log2N is logeN * loge2. (High-school maths, again.)
So O(Nlog2N) is O(NlogN)
Little correction to your statement:
I understand worst case happens when the pivot is the smallest or the largest element.
Actually, the worst case happens when each successive pivots are the smallest or the largest element of remaining partitioned array.
To better understand the worst case: Think about an already sorted array, which you may be trying to sort.
You select first element as first pivot. After comparing the rest of the array, you would find that the n-1 elements still are on the other end (rightside) and the first element remains at the same position, which actually totally beats the purpose of partitioning. These steps you would keep repeating till the last element with the same effect, which in turn would account for (n-1 + n-2 + n-3 + ... + 1) comparisons, and that sums up to (n*(n-1))/2 comparisons. So,
O(n*(n-1)/2) = O(n^2) for worst case.
To overcome this problem, it is always recommended to pick up each successive pivots randomly.
I would try to add explanation for average case as well.
The best case can derived from the Master theorem. See https://en.wikipedia.org/wiki/Master_theorem for instance, or Cormen, Leiserson, Rivest, and Stein: Introduction to Algorithms for a proof of the theorem.
One way to think of it is the following.
If quicksort makes a poor choice for a pivot, then the pivot induces an unbalanced partition, with most of the elements being on one side of the pivot (either below or above). In an extreme case, you could have, as you suggest, that all elements are below or above the pivot. In that case we can model the time complexity of quicksort with the recurrence T(n)=T(1)+T(n-1)+O(n). However, T(1)=O(1), and we can write out the recurrence T(n)=O(n)+O(n-1)+...+O(1) = O(n^2). (One has to take care to understand what it means to sum Big-Oh terms.)
On the other hand, if quicksort repeatedly makes good pivot choices, then those pivots induce balanced partitions. In the best case, about half the elements will be below the pivot, and about half the elements above the pivot. This means we recurse on two subarrays of roughly equal sizes. The recurrence then becomes T(n)=T(n/2)+T(n/2)+O(n) = 2T(n/2)+O(n). (Here, too, one must take care about rounding the n/2 if one wants to be formal.) This solves to T(n)=O(n log n).
The intuition for the last paragraph is that we compute the relative position of the pivot in the sorted order without actually sorting the whole array. We can then compute the relative position of the pivot in the below subarray without having to worry about the elements from the above subarray; similarly for the above subarray.
Firstly, paste a pseudo-code here:
In my opinion, you need understand the two cases: the worst case & the best case.
The worst case:
The most unbalanced partition occurs when the pivot divides the list into two sublists of sizes 0 and n−1. The complexity in recursively is T(n)=O(n)+T(0)+T(n-1)=O(n)+T(n-1). The master theorem tells that
T(n)=O(n²).
The best case:
In the most balanced case, each time we perform a partition we divide the list into two nearly equal pieces. The same as the worst, the recurrence relation is T(n)=O(n)+2T(n/2). And it can transform to T(n)=O(n logn).
I have implemented splay tree (insert, search, delete operation) in Java. Now I want to check if the complexity of the algorithm is O(logn) or not. Is there any way to check this by varying the input values (number of nodes) and checking the run time in seconds? Say, by putting input values like 1000, 100000 and checking the run time or is there any other way?
Strictly speaking, you cannot find the time complexity of the algorithm by running it for some values of n. Let's assume that you've run it for values n_1, n_2, ..., n_k. If the algorithm makes n^2 operations for any n <= max(n_1, ..., n_k) and exactly 10^100 operations for any larger value of n, it has a constant time complexity, even though it would look like a quadratic one from the points you have.
However, you can assess the number of operations it takes to complete on an input of a size n (I wouldn't even call it time complexity here, as the latter has a strict formal definition) by running on some values of n and looking at ratios T(n1) / T(n2) and n1 / n2. But even in case of a "real" algorithm (in sense that it is not a pathological case described in the first paragraph), you should be careful with the "structure" of the input (for example, a quick sort algorithm that takes the first element as pivot runs in O(n log n) on average for a random input, so it would look like an O(n log n) if you generate random arrays of different sizes. However, it runs in O(n^2) time for a reversed sorted array).
To sum it up, if you need to figure out if it's fast enough from a practical point of view and you have an idea how a typical input to your algorithm looks like, you can try generating inputs of different sizes and see how the execution time grows.
However, if you need a bound on the runtime in a mathematical sense, you need to prove some properties and bounds of your algorithm mathematically.
In your case, I would say that testing on random inputs can be a reasonable idea (because there is a mathematical proof that the time complexity of one operation is O(log n) for a splay tree), so you just need to check that the tree you have implemented is indeed a correct splay tree. One note: I'd recommend to try different patterns of queries (like inserting elements in sorted/reverse order and so on) as even unbalanced trees can work pretty fast as long as the input is "random".
When a big matrix needs to be used in an algorithm, to speed up complexity we were told to use linked lists if the matrix is sparse. Meaning that if the data is mostly the same we can save only the data that are not that value.
But how do we identify the point where using a sparse matrix is not useful anymore ?
For a square matrix of length n how do we calculate the point where we can say that the matrix has too much non-zero data to be written in a linked list ?
I imagine we need to use the memory sizes of an object, a link between two objects, then use our density factor. But what are the calculations to safely say "This matrix has x% non-zero data, it is better to use a linked list ?
The answer to your question depends on what you optimize for. Do you optimize for space or time?
Let's say you optimize for space. To keep data of a square matrix of length n, you need n*n numbers (to simplify, let's say it's an integer for each value). In case of a linked list, you need to have the actual value, the coordination of the value in the matrix and the pointer to the next non-zero value. To simplify, let's say each of those fields is of an integer size. So for a linked list, you need 4 integers for a single value to keep (plus additional data like the head of the linked list).
IMHO, once less than 1/4 of the values in the matrix is non-zero, it's more optimal to use a linked list than an array of arrays.
Obviously, there are other options to keep the matrix values; then the ratio can be different.
To optimize for time, again, it depends which operations you want to run...