Complete AND Full Binary tree max and min indexes? - java

I'm not sure if i'm overthinking this, but I cannot think of the general case solution :(

As far as we store nodes having greater values in right subtree and smaller values in left subtree for any node, the node at Bottom Right corner will be having the maximum value and the node at Bottom Left corner will be having the minimum value.
So now you need to find the nodes. It is given that the tree is Complete and Full BST and also it is stored in Array so indexes are uniformly distributed over the nodes. So here we need to move Top to Bottom and Left to Right assigning index to nodes starting from 1 to n if there are n nodes.
So If we write indexes for given tree,
1
/ \
2 3
/ \ / \
4 5 6 7
/ \ ^^^-------------Largest value
8 9
Smallest value----^^^
So here a node having 8 index will be having the smallest value and a node having 7 index will be having the largest value.
So now the question is how to find it. So consider that we have a tree with level l then the index for largest value will be 2^level - 1 and smallest value will be at 2^levelth index.However the index we get here for largest value may give us a wrong answer if total_nodes = 2^level-1. So we need to calculate level for this in a different way by considering total_nodes = n+1.
int level = (int)(Math.ceil ( Math.log(n)/Math.log(2) )); //For index of smallest value;
int smallest_index = (int) Math.pow (2,level);
level = (int)(Math.ceil ( Math.log(n+1)/Math.log(2) )); //For index of largest value;
int largest_index = (int) Math.pow (2,level) - 1;

Sanket's answer is basically right, but the way it is eventually stated makes me uncomfortable. Who's to say that the rounded(!) ratio of two rounded(!) logs won't round to just slightly higher than the intended integer? And then ceil will carry it all the way up. Maybe it works, but it would essentially be by accident, not by design.
It can also be stated "cleanly", without the need to think/worry about such things, in terms of bitwise arithmetic. Everything stays an integer, so it's easier to reason about.
The index of the lowest item is at the highest power of 2 present in n, so the highest set bit. In Java there is even a function for that: Integer.highestOneBit, or we could write it:
int highestOneBit(int x) {
x |= x >> 16;
x |= x >> 8;
x |= x >> 4;
x |= x >> 2;
x |= x >> 1;
return x ^ (x >>> 1);
}
And now we have
indexOfLowest = highestOneBit(n);
indexOfHighest = highestOneBit(n + 1) - 1;
This still assumes 1-based indexes (leaving index 0 unused), you can simply offset it all by 1 to make it 0-indexed.

Related

Find a matrix which satisfies certain constraints

Another description of the problem: Compute a matrix which satisfies certain constraints
Given a function whose only argument is a 4x4 matrix (int[4][4] matrix), determine the maximal possible output (return value) of that function.
The 4x4 matrix must satisfy the following constraints:
All entries are integers between -10 and 10 (inclusively).
It must be symmetrix, entry(x,y) = entry(y,x).
Diagonal entries must be positive, entry(x,x) > 0.
The sum of all 16 entries must be 0.
The function must only sum up values of the matrix, nothing fancy.
My question:
Given such a function which sums up certain values of a matrix (matrix satisfies above constraints), how do I find the maximal possible output/return value of that function?
For example:
/* The function sums up certain values of the matrix,
a value can be summed up multiple or 0 times. */
// for this example I arbitrarily chose values at (0,0), (1,2), (0,3), (1,1).
int exampleFunction(int[][] matrix) {
int a = matrix[0][0];
int b = matrix[1][2];
int c = matrix[0][3];
int d = matrix[1][1];
return a+b+c+d;
}
/* The result (max output of the above function) is 40,
it can be achieved by the following matrix: */
0. 1. 2. 3.
0. 10 -10 -10 10
1. -10 10 10 -10
2. -10 10 1 -1
3. 10 -10 -1 1
// Another example:
// for this example I arbitrarily chose values at (0,3), (0,1), (0,1), (0,4), ...
int exampleFunction2(int[][] matrix) {
int a = matrix[0][3] + matrix[0][1] + matrix[0][1];
int b = matrix[0][3] + matrix[0][3] + matrix[0][2];
int c = matrix[1][2] + matrix[2][1] + matrix[3][1];
int d = matrix[1][3] + matrix[2][3] + matrix[3][2];
return a+b+c+d;
}
/* The result (max output of the above function) is -4, it can be achieved by
the following matrix: */
0. 1. 2. 3.
0. 1 10 10 -10
1. 10 1 -1 -10
2. 10 -1 1 -1
3. -10 -10 -1 1
I don't know where to start. Currently I'm trying to estimate the number of 4x4 matrices which satisfy the constraints, if the number is small enough the problem could be solved by brute force.
Is there a more general approach?
Can the solution to this problem be generalized such that it can be easily adapted to arbitrary functions on the given matrix and arbitrary constraints for the matrix?
You can try to solve this using linear programming techniques.
The idea is to express the problem as some inequalities, some equalities, and a linear objective function and then call a library to optimize the result.
Python code:
import scipy.optimize as opt
c = [0]*16
def use(y,x):
c[y*4+x] -= 1
if 0:
use(0,0)
use(1,2)
use(0,3)
use(1,1)
else:
use(0,3)
use(0,1)
use(0,1)
use(0,3)
use(0,3)
use(0,2)
use(1,2)
use(2,1)
use(3,1)
use(1,3)
use(2,3)
use(3,2)
bounds=[ [-10,10] for i in range(4*4) ]
for i in range(4):
bounds[i*4+i] = [1,10]
A_eq = [[1] * 16]
b_eq = [0]
for x in range(4):
for y in range(x+1,4):
D = [0]*16
D[x*4+y] = 1
D[y*4+x] = -1
A_eq.append(D)
b_eq.append(0)
r = opt.linprog(c,A_eq=A_eq,b_eq=b_eq,bounds=bounds)
for y in range(4):
print r.x[4*y:4*y+4]
print -r.fun
This prints:
[ 1. 10. -10. 10.]
[ 10. 1. 8. -10.]
[-10. 8. 1. -10.]
[ 10. -10. -10. 1.]
16.0
saying that the best value for your second case is 16, with the given matrix.
Strictly speaking you are wanting integer solutions. Linear programming solves this type of problem when the inputs can be any real values, while integer programming solves this type when the inputs must be integers.
In your case you may well find that the linear programming method already provides integer solutions (it does for the two given examples). When this happens, it is certain that this is the optimal answer.
However, if the variables are not integral you may need to find an integer programming library instead.
Sort the elements in the matrix in descending order and store in an array.Iterate through the elements in the array one by one
and add it to a variable.Stop iterating at the point when adding an element to variable decrease its value.The value stored in the variable gives maximum value.
maxfunction(matrix[][])
{
array(n)=sortDescending(matrix[][]);
max=n[0];
i=1;
for i to n do
temp=max;
max=max+n[i];
if(max<temp)
break;
return max;
}
You need to first consider what matrices will satisfy the rules. The 4 numbers on the diagonal must be positive, with the minimal sum of the diagonal being 4 (four 1 values), and the maximum being 40 (four 10 values).
The total sum of all 16 items is 0 - or to put it another way, sum(diagnoal)+sum(rest-of-matrix)=0.
Since you know that sum(diagonal) is positive, that means that sum(rest-of-matrix) must be negative and equal - basically sum(diagonal)*(-1).
We also know that the rest of the matrix is symmetrical - so you're guaranteed the sum(rest-of-matrix) is an even number. That means that the diagonal must also be an even number, and the sum of the top half of the matrix is exactly half the diagonal*(-1).
For any given function, you take a handful of cells and sum them. Now you can consider the functions as fitting into categories. For functions that take all 4 cells from the diagonal only, the maximum will be 40. If the function takes all 12 cells which are not the diagonal, the maximum is -4 (negative minimal diagonal).
Other categories of functions that have an easy answer:
1) one from the diagonal and an entire half of the matrix above/below the diagonal - the max is 3. The diagonal cell will be 10, the rest will be 1, 1, 2 (minimal to get to an even number) and the half-matrix will sum at -7.
2) two cells of the diagonal and half a matrix - the max is 9. the two diagonal cells are maximised to two tens, the remaining cells are 1,1 - and so the half matrix sums at -11.
3) three cells from the diagonal and half a matrix - the max is 14.
4) the entire diagonal and half the matrix - the max is 20.
You can continue with the categories of selecting functions (using some from the diagonal and some from the rest), and easily calculating the maximum for each category of selecting function. I believe they can all be mapped.
Then the only step is to put your new selecting function in the correct category and you know the maximum.

Place n points while maximizing the minimum distance

Given a distance d (going from 0 to d) and 2 points s and e in between which no points can be placed (placing points on s and e is fine, it's not allowed to place points between them).
Place n points such that the distance between each point is as large as possible (distribute them as evenly as possible).
Output the minimal distance between 2 points.
Graphic representation, place n points on the black line (it's a 1-dimensional line) so that the smallest distance between each 2 points is as large as possible (an absolute error of up to 10^(-4) is allowed).
Examples:
d=7, n=2, s=6, e=7, Output is: 7.0000000000
d=5, n=3, s=5, e=5, Output is: 2.5000000006
d=3, n=3, s=0, e=1, Output is: 1.5000000007
d=9, n=10, s=5, e=6, Output is: 1.0000000001
d=6, n=2, s=1, e=6, Output is: 6.0000000000
d=5, n=3, s=4, e=5, Output is: 2.5000000006
My approach:
I tried looking at the intervals separately, distributing points (ideal distribution, lengthOfInterval/n) on the first and second interval (0 to s and e to d) and inspecting all distributions whose number of points sum up to n, I would store a (distribution, largest minimal distance) pair and pick the pair with the largest minimal distance. I don't know how to work with the 10^(-4) tolerance (how does this part even look in code?) and am not sure if my approach is correct. Every suggestion is welcome.
I'm stuck on this question :/
You can use binary search over the possible sizes of gaps between points (from 0 to d) to converge to the largest minimum gap size.
To determine the viability of any given gap size, you basically try to place points from the left and from the right and see whether the gap in the middle is big enough:
Determine how many points can be placed left of s (which is s/gapSize + 1).
Determine how many points will then be required to be placed to the right of e
(which is n - points on left).
Determine how far inwards each side will go.
Check whether the points on the right fits in the gap [e, d] and whether there's at least gap size difference between each side.
Code for this: (note that I worked with number of gaps instead of points, which is just 1 less than the number of points, since it leads to simpler code)
double high = d, low = 0, epsilon = 0.000001;
while (low + epsilon < high)
{
double mid = (low + high)/2;
int gapsOnLeft = (int)(s/mid); // gaps = points - 1
if (gapsOnLeft + 1 > n)
gapsOnLeft = n - 1;
int gapsOnRight = n - gapsOnLeft - 2; // will be -1 when there's no point on the right
double leftOffset = mid*gapsOnLeft;
// can be > d with no point on the right, which makes the below check work correctly
double rightOffset = d - mid*gapsOnRight;
if (leftOffset + mid <= rightOffset && rightOffset >= e)
low = mid;
else
high = mid;
}
System.out.println(low);
Live demo.
The time complexity is O(log d).
The problem with your approach is that it's hard to figure out how big the gaps between points are supposed to be, so you won't know how many points are supposed to go on either side of (s, e) as to end up with an optimal solution and to correctly deal with both cases when s and e are really close together and when they're far apart.
Binary search
Its very easy to find the number of points you can place if the minimum separation distance b/w any pair l is given.
If l=d, then at the most only 2 points can be placed.
..
...
....
so just do a binary search on l.
A crude implementation goes like this.
low,high=0.00001,d
while(high-low>eps):
m = (low+high)/2
if((no. of points placed s.t any pair is at most m units away) >=n):
low=mid
else:
high=mid
TL;DR: Your approach does not always work (and you're not doing it as fast as you could) see the 3rd bullet point for one that works (and uses the given 10^(-4)).
If [s, e] is small and well-placed, then the optimum is just distributing evenly on the whole segment, best value is now d/(n-1). But you'll have to check that none of your elements is between s and e.
Your approach works if s and e are "far enough".
You can do it faster than what you seem to suggest though, by lookign for the best splitting between the two segments in time O(1): if you put n1 (1<=n1<=n-1) elements on the left, you want to maximize min(s/(n1-1), (d-e)/(n-n1-1)) (one of these quantities being possibly +infinity, but then the other is not). The maximum of that function is obtained for s/(x-1) = (d-e)/(n-x-1), just compute the corresponding value for x, and either its floor or ceiling is the best value for n1. The distance obtained is best = min(s/(n1-1), (d-e)/(n-n1-1)) Then you put n1 points on the left, starting at 0, separated by distance best, and n-n1 on the right, starting at d, going left, separated by best.
If the distance between the last point on the left and the first on the right is smaller than best, then you have a problem, this approach does not work.
The complicated case is when the two previous approaches failed: the hole is small and not well placed. Then there are probably many ways to solve the problem. One is to use binary search to find the optimal space between two consecutive points. Given a candidate space sp, try distributing points on the line starting at 0, spaced by sp, as many as you can while remaining below s. Do the same on the right while staying above e and above (last on the left + sp). If you have successfully placed at least n points in total, then sp is too small. Otherwise, it is too big.
Thus, you can use binary search to find the optimal spas follows: start at sp possibly in [max(s, d-e)/(n-1), d/(n-1)]. At each step, take the middle mid of your possible segment [x, y]. Check if the real optimum is above or below mid. According to your case, look for the optimum in [mid, y] or [x, mid]. Stop iff y-x < 10^(-4).
The two previous cases will actually also be found by this method, so you don't need to implement them, except if you want the exact optimal value when possible (i.e. in the first two cases).
It's pretty tricky, except for the simple case (no point lands in the gap):
double dMin = d / (n - 1.0);
if (Math.ceil(e / dMin - 1) * dMin <= s)
return dMin;
Let's continue with the edge cases, placing one point on one side and the rest of the points on the other one:
dMin = Math.min((d - e) / (n - 2.0), e); // one point at 0
double dm = Math.min(s / (n - 2.0), d - s); // one point at d
if (dm > dMin) // 2nd configuration was better
dMin = dm;
And finally for two or more points on both sides:
// left : right = (x - 1) : (n - x - 1)
// left * n - left * x - left = right * x - right
// x * (left + right) = left * n - left + right
// x = (left * n - left + right) / (left + right) = (left * n) / (left + right) - 1
int x = s * n / (d - e + s) - 1;
if (x < 2)
x = 2;
for (int y = x; y <= x + 2 && y < n - 1; y++) {
double dLeft = s / (y - 1.0);
double dRight = (d - e) / (n - y - 1.0);
dm = Math.min(dLeft, dRight);
if (dm > e - s) { // dm bigger than gap
if (dLeft > dRight)
dLeft = e / ((double) y);
else
dRight = (d - s) / ((double) n - y);
dm = Math.min(dLeft, dRight);
}
if (dm > dMin)
dMin = dm;
}
This would be O(1) space and time, but I'm not 100% positive if all cases are checked. Please let me know if it worked. Tested against all the test cases. The above works for n >= 2, if n equals 2 it will be caught by the first check.

Fastest way to compute all 1-hamming distanced neighbors of strings?

I am trying to compute hamming distances between each node in a graph of n nodes. Each node in this graph has a label of the same length (k) and the alphabet used for labels is {0, 1, *}. The '*' operates as a don't care symbol. For example, hamming distances between labels 101*01 and 1001*1 is equal to 1 (we say they only differ at the 3rd index).
What I need to do is to find all 1-hamming-distance neighbors of each node and report exactly at which index those two labels differ.
I am comparing each nodes label with all others character by character as follows:
// Given two strings s1, s2
// returns the index of the change if hd(s1,s2)=1, -1 otherwise.
int count = 0;
char c1, c2;
int index = -1;
for (int i = 0; i < k; i++)
{
// do not compute anything for *
c1 = s1.charAt(i);
if (c1 == '*')
continue;
c2 = s2.charAt(i);
if (c2 == '*')
continue;
if (c1 != c2)
{
index = i;
count++;
// if hamming distance is greater than 1, immediately stop
if (count > 1)
{
index = -1;
break;
}
}
}
return index;
I may have a couple of millions nodes. k is usually around 50. I am using JAVA, this comparison takes n*n*k time and operates slow. I considered making use of tries and VP-trees but could not figure out which data structure works for this case. I also studied the Simmetrics library but nothing flashed into my mind. I would really appreciate any suggestions.
Try this approach:
Convert the keys into ternary numbers (base 3). i.e. 0=0, 1=1, *=2
10 digits ternary give you a range of 0..59049 which fits in 16 bits.
That means two of those would form a 32 bit word. Create a lookup table with 4 billion entries that return the distance between those two 10 digit ternary words.
You can now use the lookup table to check 10 characters of the key with one lookup. If you use 5 characters, then 3^5 gives you 243 values which would fit into one byte, so the lookup table would only be 64 KB.
By using shift operations, you can create lookup tables of different sizes to balance memory and speed.
That way, you can optimize the loop to abort much more quickly.
To get the position of the first difference, you can use a second lookup table which contains the index of the first difference for two key substrings.
If you have millions of nodes, then you will have many that start with the same substring. Try to sort them into buckets where one bucket contains nodes that start with the same key. The goal here is to make the buckets as small as possible (to reduce the n*n).
Instead of / additional to the string, store a mask for 1 bits and a mask for * bits. One could use BitSet, but let's try without.
static int mask(String value, char digit) {
int mask = 0;
int bit = 2; // Start with bits[1] as per specification.
for (int i = 0; i < value.length(); ++i) {
if (value.charAt(i) == digit) {
mask |= bit;
}
bit <<= 1;
}
return mask;
}
class Cell {
int ones;
int stars;
}
int difference(Cell x, Cell y) {
int distance = 0;
return (x.ones & ~y.stars) ^ (y.ones & ~x.stars);
}
int hammingDistance(Cell x, Cell y) {
return Integer.bitCount(difference(x, y));
}
boolean differsBy1(Cell x, Cell y) {
int diff = difference(x, y);
return diff == 0 ? false : (diff & (diff - 1)) == 0;
}
int bitPosition(int diff) {
return Integer.numberOfTrailingZeroes(diff);
}
Interesting problem. It would be easy it weren't for the wild card symbol.
If the wildcard was a regular character in the alphabet, then for a given string you could enumerate all k hamming distance 1 strings. Then look these strings up in a multi-map. So for example for 101 you look up 001,111 and 100.
The don't care symbol makes it so that you can't do that lookup. However if the multi-map is build such that each node is stored by all its possible keys you can do that lookup again. So for example 1*1 is stored as 111 and 101. So when you do the look up for 10* you look up 000,010,011,001,111 which would find 1*1 which was stored by 111.
The upside of this is also that you can store all labels as integers rather then trinary structures so with an int[3] as the key value you can use any k < 96.
Performance would depend on the backing implementation of the multi-map. Ideally you'd use a hash implementation for key sizes < 32 and a tree-implementation for anything above. With the tree-implementation all nodes be connected to their distance-1 neighbors in O(n*k*log(n)). Building the multi-map takes O(n * 2 ^ z) where z is maximum number of wildcard characters for any string. If the average number of wildcards is low this should be an acceptable performance penalty.
edit: You improve look up performance for all nodes to O(n*log(n)) by also inserting the hamming distance 1 neighbors into the multi-map but that might just explode its size.
Note: I'm typing this in a lunch break. I haven't checked the details yet.

truncated binary logarithm

I have a question about this problem, and any help would be great!
Write a program that takes one integer N as an
argument and prints out its truncated binary logarithm [log2 N]. Hint: [log2 N] = l is the largest integer ` such that
2^l <= N.
I got this much down:
int N = Integer.parseInt(args[0]);
double l = Math.log(N) / Math.log(2);
double a = Math.pow(2, l);
But I can't figure out how to truncate l while keeping 2^l <= N
Thanks
This is what i have now:
int N = Integer.parseInt(args[0]);
int i = 0; // loop control counter
int v = 1; // current power of two
while (Math.pow(2 , i) <= N) {
i = i + 1;
v = 2 * v;
}
System.out.println(Integer.highestOneBit(N));
This prints out the integer that is equal to 2^i which would be less than N. My test still comes out false and i think that is because the question is asking to print the i that is the largest rather than the N. So when i do
Integer.highestOneBit(i)
the correct i does not print out. For example if i do: N = 38 then the highest i should be 5, but instead it prints out 4.
Then i tried this:
int N = Integer.parseInt(args[0]);
int i; // loop control counter
for (i= 0; Math.pow(2 , i) == N; i++) {
}
System.out.println(Integer.highestOneBit(i));
Where if i make N = 2 i should print out to be 1, but instead it is printing out 0.
I've tried a bunch of things on top of that, but cant get what i am doing wrong. Help would be greatly appreciated. Thanks
I believe the answer you're looking for here is based on the underlying notion of how a number is actually stored in a computer, and how that can be used to your advantage in a problem such as this.
Numbers in a computer are stored in binary - a series of ones and zeros where each column represents a power of 2:
(Above image from http://www.mathincomputers.com/binary.html - see for more info on binary)
The zeroth power of 2 is over on the right. So, 01001, for example, represents the decimal value 2^0 + 2^3; 9.
This storage format, interestingly, gives us some additional information about the number. We can see that 2^3 is the highest power of 2 that 9 is made up of. Let's imagine it's the only power of two it contains, by chopping off all the other 1's except the highest. This is a truncation, and results in this:
01000
You'll now notice this value represents 8, or 2^3. Taking it down to basics, lets now look at what log base 2 really represents. It's the number that you raise 2 to the power of to get the thing your finding the log of. log2(8) is 3. Can you see the pattern emerging here?
The position of the highest bit can be used as an approximation to it's log base 2 value.
2^3 is the 3rd bit over in our example, so a truncated approximation to log base 2(9) is 3.
So the truncated binary logarithm of 9 is 3. 2^3 is less than 9; This is where the less than comes from, and the algorithm to find it's value simply involves finding the position of the highest bit that makes up the number.
Some more examples:
12 = 1100. Position of the highest bit = 3 (starting from zero on the right). Therefore the truncated binary logarithm of 12 = 3. 2^3 is <= 12.
38 = 100110. Position of the highest bit = 5. Therefore the truncated binary logarithm of 38 = 5. 2^5 is <= 38.
This level of pushing bits around is known as bitwise operations in Java.
Integer.highestOneBit(n) returns essentially the truncated value. So if n was 9 (1001), highestOneBit(9) returns 8 (1000), which may be of use.
A simple way of finding the position of that highest bit of a number involves doing a bitshift until the value is zero. Something a little like this:
// Input number - 1001:
int n=9;
int position=0;
// Cache the input number - the loop destroys it.
int originalN=n;
while( n!=0 ){
position++; // Also position = position + 1;
n = n>>1; // Shift the bits over one spot (Overwriting n).
// 1001 becomes 0100, then 0010, then 0001, then 0000 on each iteration.
// Hopefully you can then see that n is zero when we've
// pushed all the bits off.
}
// Position is now the point at which n became zero.
// In your case, this is also the value of your truncated binary log.
System.out.println("Binary log of "+originalN+" is "+position);

Please explain the logic behind Kernighan's bit counting algorithm

This question directly follows after reading through Bits counting algorithm (Brian Kernighan) in an integer time complexity . The Java code in question is
int count_set_bits(int n) {
int count = 0;
while(n != 0) {
n &= (n-1);
count++;
}
}
I want to understand what n &= (n-1) is achieving here ? I have seen a similar kind of construct in another nifty algorithm for detecting whether a number is a power of 2 like:
if(n & (n-1) == 0) {
System.out.println("The number is a power of 2");
}
Stepping through the code in a debugger helped me.
If you start with
n = 1010101 & n-1=1010100 => 1010100
n = 1010100 & n-1=1010011 => 1010000
n = 1010000 & n-1=1001111 => 1000000
n = 1000000 & n-1=0111111 => 0000000
So this iterates 4 times. Each iteration decrements the value in such a way that the least significant bit that is set to 1 disappears.
Decrementing by one flips the lowest bit and every bit up to the first one. e.g. if you have 1000....0000 -1 = 0111....1111 not matter how many bits it has to flip and it stops there leaving any other bits set untouched. When you and this with n the lowest bit set and only the lowest bit becomes 0
Subtraction of 1 from a number toggles all the bits (from right to left) till the rightmost set bit(including the righmost set bit).
So if we subtract a number by 1 and do bitwise & with itself (n & (n-1)), we unset the righmost set bit. In this way we can unset 1s one by one from right to left in loop.
The number of times the loop iterates is equal to the number of set
bits.
Source : Brian Kernighan's Algorithm

Categories

Resources