Array Constraint Satisfaction - java

Given a range of number 1 through N, where N >=3. you have to take an
array of length 2N and place each number ( from the range 1 to N)
twice. such a that the distance between two indexes of a number is
equal to the number. example
N=3
( 3, 1, 2, 1, 3, 2 )
The solution I'm thinking of is as follows:
Generate every permutation array of the range of numbers, ex: {1,2,3|, {3,2,1}, {2,1,3}, etc.
For each permutation array, run the following functions:
foreach(int number : permutationArray){
addToResultArray(number);
}
addToResultArray(int toBeAdded){
for(int i = 0; i < resultArray.length; i++){
//-1 implies the resultArray is empty at that index
if(resultArray[i]==-1 && resultArray[i+toBeAdded+1]==-1)
resultArray[i] = toBeAdded;
}
}
If the above functions do not cause an out of bounds exception, you have a valid solution.
I do not think this solution is very good. Does anyone have something better?

This can be viewed as a constrained problem: you have 2*N variables V={v1,v2,...,v2n} (representing the array indices) each variable has [1,2,..,N] possible values under the constraints:
Every value is assigned to exactly two variables.
The distance between variables with the same value equals the value itself.
An assignment is a mapping from the set of variables to their possible values. For example [v1=1,v2=3,v3=5] is an assignment for v1,v2 and v3. A given assignment is consistent if it satisfies the constraints. An example to inconsistent assignment is [v1=3,v2=1,v3=3]
An assignment is complete if its cardinality equals the variables size (i.e. 2*N). Otherwise it is a partial assignment. Clearly, our goal is to have one or more complete consistent assignment (a.k.a solution).
So the problem is basically a backtracking search problem. In particular, we have an ordering over the variables. Each time we assign a value to the current variable. If the value makes the current assignment inconsistent, we backtrack.
Your approach is actually a generate-and-test which is inefficient. Generating all solutions and counting them is hard problem in general. In most of the cases, we are looking for one solution.
Here is the interesting part: there is a much more efficient way to do this by propagating the values and detecting backtracking sooner (see constraint propagation).

Related

Subset sum problem with continuous subset using recursion

I am trying to think how to solve the Subset sum problem with an extra constraint: The subset of the array needs to be continuous (the indexes needs to be). I am trying to solve it using recursion in Java.
I know the solution for the non-constrained problem: Each element can be in the subset (and thus I perform a recursive call with sum = sum - arr[index]) or not be in it (and thus I perform a recursive call with sum = sum).
I am thinking about maybe adding another parameter for knowing weather or not the previous index is part of the subset, but I don't know what to do next.
You are on the right track.
Think of it this way:
for every entry you have to decide: do you want to start a new sum at this point or skip it and reconsider the next entry.
a + b + c + d contains the sum of b + c + d. Do you want to recompute the sums?
Maybe a bottom-up approach would be better
The O(n) solution that you asked for:
This solution requires three fixed point numbers: The start and end indices, and the total sum of the span
Starting from element 0 (or from the end of the list if you want) increase the end index until the total sum is greater than or equal to the desired value. If it is equal, you've found a subset sum. If it is greater, move the start index up one and subtract the value of the previous start index. Finally, if the resulting total is greater than the desired value, move the end index back until the sum is less than the desired value. In the other case (where the sum is less) move the end index forward until the sum is greater than the desired value. If no match is found, repeat
So, caveats:
Is this "fairly obvious"? Maybe, maybe not. I was making assumptions about order of magnitude similarity when I said both "fairly obvious" and o(n) in my comments
Is this actually o(n)? It depends a lot on how similar (in terms of order of magnitude (digits in the number)) the numbers in the list are. The closer all the numbers are to each other, the fewer steps you'll need to make on the end index to test if a subset exists. On the other hand, if you have a couple of very big numbers (like in the thousands) surrounded by hundreds of pretty small numbers (1's and 2's and 3's) the solution I've presented will get closers to O(n^2)
This solution only works based on your restriction that the subset values are continuous

Multiple time complexity solutions for recursive Pascal triangle algorithm?

I have created the following simple algorithm in Java that computes the Pascal triangle in a recursive way, in the form of a 2D list of ints:
public class PascalTriangleRec{
private final int[][] points;
public PascalTriangleRec(int size){
points = new int[size][];
for (int i =0;i<size;i++){
int[] row = new int[i+1];
for (int j = 0;j<=i;j++){
row[j]=getValueAtPoint(i,j);
}
points[i]=row;
}
}
public static int getValueAtPoint(int row, int col){
if (col == 0 || col == row) return 1;
else return getValueAtPoint(row-1,col-1) + getValueAtPoint(row-1,col);
}
}
I need to know the time complexity of this algorithm. I found another question on StackOverflow that gives the time complexity of the getValueAtPoint function as O(2^n/sqrt(n)). I figured that since this function is embedded in two nested for loops, the time complexity of the entire Pascal triangle is O(sqrt(n^3)*2^n). I am pretty sure this reasoning is correct.
On the other hand I devised a completely different way to think about this problem, which goes as follows:
There is a certain property of Pascal triangles called Pascal's Corollary 8. This property states that the sum of all the coëfficients on a given row r is equal to 2^r, with r starting at 0.
One can also note that the getValueAtPoint function from my code sample will keep recursively calling itself until it returns 1 at some point. This means that all the coëfficients in the Pascal triangle are formed by adding 1 as many times as the value of that coëfficient.
Since adding 1s takes a constant time, one can say that the time needed to compute a given row in the triangle is equal to some constant time multiplied by the combined value of all the coëfficients in that row. This means that the time complexity of a given row r in the triangle must be 2^r.
The time needed to compute the entire triangle is equal to the sum of the time needed to calculate all the rows in the triangle. This results in a geometric series, which computes the sum of all 2^r for r going from 0 to n-1.
Using the summation property of the geometric series, this series can be rewritten in the following form.
This means that the time complexity of the algorithm according to this last derivation is O(2^n).
These two approaches yield different results, even though they both seem logical and correct to me. My question is in the first place if both these approaches are correct, and if both can be seen as correct at the same time? As I view it both of them are correct, but the second one is more accurate since for the first one the worst-case scenario is taken for the getValueAtPoint function, and applied to all coëfficients, which is clearly not the case in reality. Does this mean that the first one becomes incorrect, even though the logic behind it is correct, just because a better approach exists?
The simple answer is "too many variables". First of all, your analysis is exactly correct: the complexity depends on the sum of all the values computed. The same logic underlies the answer that got you O(2^n/sqrt(n)).
There are two problems:
Little problem: Stirling's approximation is just that: some terms are elided. I think they fall out when you combine all the loops, but I'd have to work through the nasty details to be sure.
Big problem: the values of n you combine are not the same n. That last n value you incorporated is i running from 0 to size; each value of i becomes n for an initial call to getValueAtPoint.
Try doing the sum from 0 to n on your previous complexity, and see what you get?
.

How to optimize checking if one of the given arrays is contained in yet another array

I have an array of integers which is updated every set interval of time with a new value (let's call it data). When that happens I want to check if that array contains any other array of integers from specified set (let's call that collection).
I do it like this:
separate a sub-array from the end of data of length X (arrays in the collection have a set max length of X);
iterate trough the collection and check if any array in it is contained in the separated data chunk;
It works, though it doesn't seem optimal. But every other idea I have involves creating more collections (e.g. create a collection of all the arrays from the original collection that end with the same integer as data, repeat). And that seems even more complex (on the other hand, it looks like the only way to deal with arrays in collections without limited max length).
Are there any standard algorithms to deal with such a problem? If not, are there any worthwhile optimizations I can apply to my approach?
EDIT:
To be precise, I:
separate a sub-array from the end of data of length X (arrays in the collection have a set max length of X and if the don't it's just the length of the longest one in the collection);
iterate trough the collection and for every array in it:
separate sub-array from the previous sub-array with length matching current array in collection;
use Java's List.equals to compare the arrays;
EDIT 2:
Thanks for all the replays, surely they'll come handy some day. In this case I decided to drop the last steps and just compare the arrays in my own loop. That eliminates creating yet another sub-array and it's already O(N), so in this specific case will do.
Take a look at the KMP algorithm. It's been designed with String matching in mind, but it really comes down to matching subsequences of arrays to given sequences. Since that algorithm has linear complexity (O(n)), it can be said that it's pretty optimal. It's also a basic staple in standard algorithms.
dfens proposal is smart in that it incurs no significant extra complexity iff you keep the current product along with the main array, and can be checked in O(1), but it is also quite fragile and produces many false positives and negatives. Just imagine a target array [1, 1, ..., 1], which will always produce a positive test for all non-trivial main arrays. It also breaks down when one bucket contains a 0. That means that a successful check against his test is always a necessary condition for a hit (0s aside), but is never sufficient - aka with that method alone, you can never be sure of the validity of that result.
look at the rsync algorithm... if i understand it correctly you could go about:
you've got a immense array of data [length L]
at the end of that data, you've got N Bytes of data, and you want to know whether those N bytes ever appeared before.
precalculate:
for every offset in the array, calculate the checksum over the next N data elements.
Hold that checksum in a seperate array.
Using a rolling checksum like rsync does, you can do this step in O(N) time for all elements..
Whenever new data arrives:
Calculate the checksum over the last N elements. Using a rolling checksum, this could be O(1)
Check that checksum against all the precalculated checksums. If it matches, check equality of the subarrays (subslices , whatever...). If that matches too, you've got a match.
I think, in essence this is the same as dfen's approach with the product of all numbers.
I think you can keep product of array to for immediate rejections.
So if your array is [n_1,n_2,n_3,...] you can say that it is not subarray of [m_1,m_2,m_3,...] if product m_1*m_2*... = M is not divisible by productn_1*n_2*... = N.
Example
Let's say you have array
[6,7]
And comparing with:
[6,10,12]
[4,6,7]
Product of you array is 6*7 = 42
6 * 10 * 12 = 720 which is not divisible by 42 so you can reject first array immediately
[4, 6, 7] is divisble by 42 (but you cannot reject it - it can have other multipliers)
In each interval of time you can just multiply product by new number to avoid computing whole product everytime.
Note that you don't have to allocate anything if you simulate List's equals yourself. Just one more loop.
Similar to dfens' answer, I'd offer other criteria:
As the product is too big to be handled efficiently, compute the GCD instead. It produces much more false positives, but surely fits in long or int or whatever your original datatype is.
Compute the total number of trailing zeros, i.e., the "product" ignoring all factors but powers of 2. Also pretty weak, but fast. Use this criterion before the more time-consuming ones.
But... this is a special case of DThought's answer. Use rolling hash.

Why max and min in Guava for empty arrays throw IllegalArgumentException?

I'm reading the code of Guava's Ints.max(int... array) (and similarly, Ints.min, Longs.min, etc.) They throw an IllegalArgumentException if array.length == 0 (This is Guava 15.0).
I wonder why they do not return the "identity element" in this case, instead of throwing an exception. By "identity element" I mean the element behaving like 1 for product, or 0 for sum.
That is, I would expect Ints.min() to be Integer.MAX_VALUE, Ints.max() to be Integer.MIN_VALUE, and so on.
The rationale behind this is that if you split an array in two, the min of the whole array must be the min between the mins of both sub arrays. Or, for the mathematically inclined, the sum over an empty set of real numbers is 0, the product is 1, the union of an empty collection of sets is the empty set, and so on.
Since Guava libraries tend to be carefully produced, I guess there must be an explanation for throwing an exception here. So the question is: why?
Edit: I understand that most people expect max and min of an array to be an element of the array, but this is because max/min of two elements is always one of them. On the other hand, if one regards max/min just as (commutative) binary operations, returning the identity element makes more sense. To me.
Because, IMHO, in 99.99% of the cases, when you ask the minimum element of an array, you want to get an element of this array, and not some arbitrary large value. And thus, most of the time, an empty array is a special condition, that needs a specific treatment. And not handling this special condition is thus a bug, signalled by an exception.
You said it yourself -
The rationale behind this is that if you split an array in two, the min of the whole array must be the min between the mins of both sub arrays. Or, for the mathematically inclined, the sum over an empty set of real numbers is 0, the product is 1, the union of an empty collection of sets is the empty set, and so on.
So [-1] = [-1] union [] but max([-1]) != max([-1] union []). I agree that for product or sum it makes more sense to return the respective identity, but not max/min.
I also prefer the property that max/min(S) be an element of S. Not some element having no relevance with respect to less than and greater than.
In particular if I'm working in a domain with a lot of negative numbers - say temperatures in Northern Canada - a day where my sample of temperatures is empty because the thermometer broke - it should not randomly show up as a relatively very warm day.
The minimum/maximum of array values must come from that array. If the array is empty, there there is no value to take. Returning Integer.MAX_VALUE OR Integer.MIN_VALUE here would be wrong, because those values aren't in the array. Nothing is in the array. Mathematically, the answer is the empty set, but that isn't a valid value among possible int values. There is no possible int correct answer, so the only possible correct course of action is to throw an Exception.

How to multiply two big big numbers

You are given a list of n numbers L=<a_1, a_2,...a_n>. Each of them is
either 0 or of the form +/- 2k, 0 <= k <= 30. Describe and implement an
algorithm that returns the largest product of a CONTINUOUS SUBLIST
p=a_i*a_i+1*...*a_j, 1 <= i <= j <= n.
For example, for the input <8 0 -4 -2 0 1> it should return 8 (either 8
or (-4)*(-2)).
You can use any standard programming language and can assume that
the list is given in any standard data structure, e.g. int[],
vector<int>, List<Integer>, etc.
What is the computational complexity of your algorithm?
In my first answer I addressed the OP's problem in "multiplying two big big numbers". As it turns out, this wish is only a small part of a much bigger problem which I'm going to address now:
"I still haven't arrived at the final skeleton of my algorithm I wonder if you could help me with this."
(See the question for the problem description)
All I'm going to do is explain the approach Amnon proposed in little more detail, so all the credit should go to him.
You have to find the largest product of a continuous sublist from a list of integers which are powers of 2. The idea is to:
Compute the product of every continuous sublist.
Return the biggest of all these products.
You can represent a sublist by its start and end index. For start=0 there are n-1 possible values for end, namely 0..n-1. This generates all sublists that start at index 0. In the next iteration, You increment start by 1 and repeat the process (this time, there are n-2 possible values for end). This way You generate all possible sublists.
Now, for each of these sublists, You have to compute the product of its elements - that is come up with a method computeProduct(List wholeList, int startIndex, int endIndex). You can either use the built in BigInteger class (which should be able to handle the input provided by Your assignment) to save You from further trouble or try to implement a more efficient way of multiplication as described by others. (I would start with the simpler approach since it's easier to see if Your algorithm works correctly and first then try to optimize it.)
Now that You're able to iterate over all sublists and compute the product of their elements, determining the sublist with the maximum product should be the easiest part.
If it's still to hard for You to make the connections between two steps, let us know - but please also provide us with a draft of Your code as You work on the problem so that we don't end up incrementally constructing the solution and You copy&pasting it.
edit: Algorithm skeleton
public BigInteger listingSublist(BigInteger[] biArray)
{
int start = 0;
int end = biArray.length-1;
BigInteger maximum;
for (int i = start; i <= end; i++)
{
for (int j = i; j <= end; j++)
{
//insert logic to determine the maximum product.
computeProduct(biArray, i, j);
}
}
return maximum;
}
public BigInteger computeProduct(BigInteger[] wholeList, int startIndex,
int endIndex)
{
//insert logic here to return
//wholeList[startIndex].multiply(wholeList[startIndex+1]).mul...(
// wholeList[endIndex]);
}
Since k <= 30, any integer i = 2k will fit into a Java int. However the product of such two integers might not necessarily fit into a Java int since 2k * 2k = 22*k <= 260 which fill into a Java long. This should answer Your question regarding the "(multiplication of) two numbers...".
In case that You might want to multiply more than two numbers, which is implied by Your assignment saying "...largest product of a CONTINUOUS SUBLIST..." (a sublist's length could be > 2), have a look at Java's BigInteger class.
Actually, the most efficient way of multiplication is doing addition instead. In this special case all you have is numbers that are powers of two, and you can get the product of a sublist by simply adding the expontents together (and counting the negative numbers in your product, and making it a negative number in case of odd negatives).
Of course, to store the result you may need the BigInteger, if you run out of bits. Or depending on how the output should look like, just say (+/-)2^N, where N is the sum of the exponents.
Parsing the input could be a matter of switch-case, since you only have 30 numbers to take care of. Plus the negatives.
That's the boring part. The interesting part is how you get the sublist that produces the largest number. You can take the dumb approach, by checking every single variation, but that would be an O(N^2) algorithm in the worst case (IIRC). Which is really not very good for longer inputs.
What can you do? I'd probably start from the largest non-negative number in the list as a sublist, and grow the sublist to get as many non-negative numbers in each direction as I can. Then, having all the positives in reach, proceed with pairs of negatives on both sides, eg. only grow if you can grow on both sides of the list. If you cannot grow in both directions, try one direction with two (four, six, etc. so even) consecutive negative numbers. If you cannot grow even in this way, stop.
Well, I don't know if this alogrithm even works, but if it (or something similar) does, its an O(N) algorithm, which means great performance. Lets try it out! :-)
Hmmm.. since they're all powers of 2, you can just add the exponent instead of multiplying the numbers (equivalent to taking the logarithm of the product). For example, 2^3 * 2^7 is 2^(7+3)=2^10.
I'll leave handling the sign as an exercise to the reader.
Regarding the sublist problem, there are less than n^2 pairs of (begin,end) indices. You can check them all, or try a dynamic programming solution.
EDIT: I adjusted the algorithm outline to match the actual pseudo code and put the complexity analysis directly into the answer:
Outline of algorithm
Go seqentially over the sequence and store value and first/last index of the product (positive) since the last 0. Do the same for another product (negative) which only consists of the numbers since the first sign change of the sequence. If you hit a negative sequence element swap the two products (positive and negative) along with the associagted starting indices. Whenever the positive product hits a new maximum store it and the associated start and end indices. After going over the whole sequence the result is stored in the maximum variables.
To avoid overflow calculate in binary logarithms and an additional sign.
Pseudo code
maxProduct = 0
maxProductStartIndex = -1
maxProductEndIndex = -1
sequence.push_front( 0 ) // reuses variable intitialization of the case n == 0
for every index of sequence
n = sequence[index]
if n == 0
posProduct = 0
negProduct = 0
posProductStartIndex = index+1
negProductStartIndex = -1
else
if n < 0
swap( posProduct, negProduct )
swap( posProductStartIndex, negProductStartIndex )
if -1 == posProductStartIndex // start second sequence on sign change
posProductStartIndex = index
end if
n = -n;
end if
logN = log2(n) // as indicated all arithmetic is done on the logarithms
posProduct += logN
if -1 < negProductStartIndex // start the second product as soon as the sign changes first
negProduct += logN
end if
if maxProduct < posProduct // update current best solution
maxProduct = posProduct
maxProductStartIndex = posProductStartIndex
maxProductEndIndex = index
end if
end if
end for
// output solution
print "The maximum product is " 2^maxProduct "."
print "It is reached by multiplying the numbers from sequence index "
print maxProductStartIndex " to sequence index " maxProductEndIndex
Complexity
The algorithm uses a single loop over the sequence so its O(n) times the complexity of the loop body. The most complicated operation of the body is log2. Ergo its O(n) times the complexity of log2. The log2 of a number of bounded size is O(1) so the resulting complexity is O(n) aka linear.
I'd like to combine Amnon's observation about multiplying powers of 2 with one of mine concerning sublists.
Lists are terminated hard by 0's. We can break the problem down into finding the biggest product in each sub-list, and then the maximum of that. (Others have mentioned this).
This is my 3rd revision of this writeup. But 3's the charm...
Approach
Given a list of non-0 numbers, (this is what took a lot of thinking) there are 3 sub-cases:
The list contains an even number of negative numbers (possibly 0). This is the trivial case, the optimum result is the product of all numbers, guaranteed to be positive.
The list contains an odd number of negative numbers, so the product of all numbers would be negative. To change the sign, it becomes necessary to sacrifice a subsequence containing a negative number. Two sub-cases:
a. sacrifice numbers from the left up to and including the leftmost negative; or
b. sacrifice numbers from the right up to and including the rightmost negative.
In either case, return the product of the remaining numbers. Having sacrificed exactly one negative number, the result is certain to be positive. Pick the winner of (a) and (b).
Implementation
The input needs to be split into subsequences delimited by 0. The list can be processed in place if a driver method is built to loop through it and pick out the beginnings and ends of non-0 sequences.
Doing the math in longs would only double the possible range. Converting to log2 makes arithmetic with large products easier. It prevents program failure on large sequences of large numbers. It would alternatively be possible to do all math in Bignums, but that would probably perform poorly.
Finally, the end result, still a log2 number, needs to be converted into printable form. Bignum comes in handy there. There's new BigInteger("2").pow(log); which will raise 2 to the power of log.
Complexity
This algorithm works sequentially through the sub-lists, only processing each one once. Within each sub-list, there's the annoying work of converting the input to log2 and the result back, but the effort is linear in the size of the list. In the worst case, the sum of much of the list is computed twice, but that's also linear complexity.
See this code. Here I implement exact factorial of a huge large number. I am just using integer array to make big numbers. Download the code from Planet Source Code.

Categories

Resources