custom partition problem - java

Could some one guide me on how to solve this problem.
We are given a set S with k number of elements in it.
Now we have to divide the set S into x subsets such that the difference in number of elements in each subset is not more than 1 and the sum of each subset should be as close to each other as possible.
Example 1:
{10, 20, 90, 200, 100} has to be divided into 2 subsets
Solution:{10,200}{20,90,100}
sum is 210 and 210
Example 2:
{1, 1, 2, 1, 1, 1, 1, 1, 1, 6}
Solution:{1,1,1,1,6}{1,2,1,1,1}
Sum is 10 and 6.

I see a possible solution in two stages.
Stage One
Start by selecting the number of subsets, N.
Sort the original set, S, if possible.
Distribute the largest N numbers from S into subsets 1 to N in order.
Distribute the next N largest numbers from S the subsets in reverse order, N to 1.
Repeat until all numbers are distributed.
If you can't sort S, then distribute each number from S into the subset (or one of the subsets) with the least entries and the smallest total.
You should now have N subsets all sized within 1 of each other and with very roughly similar totals.
Stage Two
Now try to refine the approximate solution you have.
Pick the largest total subset, L, and the smallest total subset, M. Pick a number in L that is smaller than a number in M but not by so much as to increase the absolute difference between the two subsets. Swap the two numbers. Repeat. Not all pairs of subsets will have swappable numbers. Each swap keeps the subsets the same size.
If you have a lot of time you can do a thorough search; if not then just try to pick off a few obvious cases. I would say don't swap numbers if there is no decrease in difference; otherwise you might get an infinite loop.
You could interleave the stages once there are at least two members in some subsets.

There is no easy algorithm for this problem.
Check out the partition problem also known as the easiest hard problem , that solve this for 2 sets. This problem is NP-Complete, and you should be able to find all the algorithms to solve it on the web
I know your problem is a bit different since you can chose the number of sets, but you can inspire yourself from solutions to the previous one.
For example :
You can transform this into a serie of linear programs, let k be the number of element in your set.
{a1 ... ak} is your set
For i = 2 to k:
try to solve the following program:
xjl = 1 if element j of set is in set number l (l <= i) and 0 otherwise
minimise max(Abs(sum(apxpn) -sum(apxpm)) for all m,n) // you minimise the max of the difference between 2 sets.
s.t
sum(xpn) on n = 1
(sum(xkn) on k)-(sum(xkm) on k) <= 1 for all m n // the number of element in 2 list are different at most of one element.
xpn in {0,1}
if you find a min less than one then stop
otherwise continue
end for
Hope my notations are clear.
The complexity of this program is exponential, and if you find a polynomial way to solve this you would probe P=NP so I don't think you can do better.
EDIT
I saw you comment ,I missunderstood the constraint on the size of the subsets (I thought it was the difference between 2 sets)
I don't I have time to update it I will do it when I have time.
sryy
EDIT 2
I edited the linear program and it should do what it's asked to do. I just added a constraint.
Hope this time the problem is fully understood, even though this solution might not be optimal

I'm no scientist, so I'd try two approaches:
After sorting items, then going from both "ends" and moving first and last to the actual set,then shift to next set, loop;
Then:
Checking the differences of sums of the sets, and shuffling items if it would help.
Coding the resulting sets appropriately and trying genetic algorithms.

Related

how to get any integer in a sorted array of size n that appear more than n/k times, given that 0 < k < n, and in O(k log n) time in java?

we need to find any integer in a sorted array appearing more than n/k in java, if not return -1. How it is possible in klogn complexity.
Imagine your input has 100,000 items in it, and we pick k = 10, as in, n/k is 10,000, as in, you're looking for the same number to be in the list 10,000 times.
Given that the list is sorted, if a number is 10k times in there, it'll be consecutive (all next to each other).
In basis then, you just need to look at list[0] and list[10000] and check if the same number is in both indices. If yes, trivially you have your answer. If no, check list[20000] and check if that is the same as what we just read at list[10000]. If yes, great, we have an answer. If no, take what's at list[20000] and use that to check it against list[30000], and so on, until getting to the end.
You only need k steps for this, so this algorithm is O(k). But it has a small problem - you can be 'unlucky' and miss a valid answer. Imagine that there is indeed a number in this list (let's say 18) that shows up 10,005 times, however, the indices of these 10005 occurrences are [9000, 19005]. Our algorithm checks index 0, 10000, and 20000, and only finds this number at 10000 and would thus incorrectly report 'no number found'.
How 'bad' can it get? Only 'half' - if the algorithm instead checks twice as often (so not indices 0, 10k, 20k, 30k, etc, but indices 0, 5000, 10000, 15000, 20000, 25000, etc), then it's not possible to 'miss', in the sense that any valid answer will result in at least 2 lookups having the same number. In this example, lookup[10000] and lookup[15000] both give you 18 - no matter how unlucky you are, you can no longer "miss it". However, if you now find a 'hit' (2 consecutive checked indices have the same number), you're not sure. If list[10000] is 18 and list[15000] is also 18, but list[20000] is not, is it a match? Could be - depends.
We've now changed the question: Instead of 'give me any number that shows up at least 10000 times in the list', it is now: 'is 18 at least 10000 times in this list?'.
That algorithm is much easier. Just use binary search to find the start and end indices and subtract those and you know how long the 'run' is. Binary search is an O(log n) algorithm.
This sticks an O(log n) algorithm inside an O(k) algorithm, thus giving you an algorithmic complexity of O(k log n), as asked for.

Subset sum problem with continuous subset using recursion

I am trying to think how to solve the Subset sum problem with an extra constraint: The subset of the array needs to be continuous (the indexes needs to be). I am trying to solve it using recursion in Java.
I know the solution for the non-constrained problem: Each element can be in the subset (and thus I perform a recursive call with sum = sum - arr[index]) or not be in it (and thus I perform a recursive call with sum = sum).
I am thinking about maybe adding another parameter for knowing weather or not the previous index is part of the subset, but I don't know what to do next.
You are on the right track.
Think of it this way:
for every entry you have to decide: do you want to start a new sum at this point or skip it and reconsider the next entry.
a + b + c + d contains the sum of b + c + d. Do you want to recompute the sums?
Maybe a bottom-up approach would be better
The O(n) solution that you asked for:
This solution requires three fixed point numbers: The start and end indices, and the total sum of the span
Starting from element 0 (or from the end of the list if you want) increase the end index until the total sum is greater than or equal to the desired value. If it is equal, you've found a subset sum. If it is greater, move the start index up one and subtract the value of the previous start index. Finally, if the resulting total is greater than the desired value, move the end index back until the sum is less than the desired value. In the other case (where the sum is less) move the end index forward until the sum is greater than the desired value. If no match is found, repeat
So, caveats:
Is this "fairly obvious"? Maybe, maybe not. I was making assumptions about order of magnitude similarity when I said both "fairly obvious" and o(n) in my comments
Is this actually o(n)? It depends a lot on how similar (in terms of order of magnitude (digits in the number)) the numbers in the list are. The closer all the numbers are to each other, the fewer steps you'll need to make on the end index to test if a subset exists. On the other hand, if you have a couple of very big numbers (like in the thousands) surrounded by hundreds of pretty small numbers (1's and 2's and 3's) the solution I've presented will get closers to O(n^2)
This solution only works based on your restriction that the subset values are continuous

Generating a partially ordered random list of numbers

I want to generate a list of random numbers of size 500, where the list is exactly 30% sorted (I know how to generate a list of at least 30% sorted), but that's not what i want, how do i generate a file that is "exactly" 30%? I'm stuck, How can this be done?
Here is the exact wording
"For the sorts, you should construct three different files of each size: ordered, keys in reverse order, and finally one in which 30% of the keys are ordered. The latter file should not consist of files in which your sort is 30% complete, but rather in files in which 30% of the keys are correctly placed with respect to one another but are not necessarily contiguous.
There are 2 main ideas I can see for percentage sorted:
Simply the number of elements out of place.
Once should be able to get an estimated % sorted by sorting it, then iterating through it, and, keeping each element the same with the desired percentage as probability, otherwise swapping it with a random remaining element (so, if we want 30% sorted, we'll keep an element the same with 30% probability, and swap it with 70%).
If an exact number is needed, one could use the above result and (intelligently) swap random elements until the desired percentage is obtained.
The number of inversions.
An inversion is a pair of places of a sequence where the elements on these places are out of their natural order.
One idea is to first sort it, then to swap random elements that get us closer to the desired percentage sorted, until we get there.
Only swapping elements that get us closer to the desired result is difficult (at least doing so efficiently).
A very brute force approach would be to count the change in the number of inversions that each pair of swaps would cause, and then pick a random one that gets us closer to our target.
Another idea is to just generate random pairs and count the number of inversions until we find one that gets us closer.
A third option is to pick a random element. If it's larger than half the elements, try to move it left (ideally increasing the number of inversions). If it's smaller, try to move it right. In trying to move it left/right, we can look for a smaller / larger element (respectively) to swap it with and count the change in inversions (we only need to consider the elements between the swapped elements when counting the change in inversions).
At first we could probably just randomly swap elements as we're likely to tend to more inversions.
If the percentage is above 50%, we could also start with a reversed array, i.e. 100% unsorted.
There's a one-to-one correspondence that maps permutations to {0} x {0, 1} x {0, 1, 2} x ... x {0, 1, ... n - 1}, where the jth element of the tuple in the codomain is the number of inversions involving elements at positions j and i < j. In this light, the problem is sampling a random element of the codomain that sums to the desired number of inversions.
Here's an instance of Gibbs sampling for this problem. Initialize a tuple summing to the desired number of permutations. Repeatedly select two distinct indices and randomize uniformly among all possibilities with the same sum. Stop when you're tired of waiting (the distribution converges on uniform but never gets there; maybe tomorrow I will figure out a Propp--Wilson style technique for exact samples).
In Python (untested):
import random
def gibbs(n, target):
perm = [0] * n
for i in range(n):
perm[i] = min(target, i)
target -= i
assert target == 0
while ???:
i = random.randrange(n)
j = random.randrange(n)
if i == j: continue
total = perm[i] + perm[j]
perm[i] = random.randrange(max(total - j, 0), i + 1)
perm[j] = total - perm[i]
for j in range(n):
perm[j] = j - perm[j]
for i in range(j):
if perm[i] >= perm[j]: perm[i] += 1
return perm
One could also get exact samples by dynamic programming and conditional probability, but the running time for 500 looks slightly prohibitive from here.

Generate N random numbers in given ranges that sum up to a given sum

first time here at Stackoverflow. I hope someone can help me with my search of an algorithm.
I need to generate N random numbers in given Ranges that sum up to a given sum!
For example: Generatare 3 Numbers that sum up to 11.
Ranges:
Value between 1 and 3.
Value between 5 and 8.
value between 3 and 7.
The Generated numbers for this examle could be: 2, 5, 4.
I already searched alot and couldnt find the solution i need.
It is possible to generate like N Numbers of a constant sum unsing modulo like this:
generate random numbers of which the sum is constant
But i couldnt get that done with ranges.
Or by generating N random values, sum them up and then divide the constant sum by the random sum and afterwards multiplying each random number with that quotient as proposed here.
Main Problem, why i cant adopt those solution is that every of my random values has different ranges and i need the values to be uniformly distributed withing the ranges (no frequency occurances at min/max for example, which happens if i cut off the values which are less/greater than min/max).
I also thought of an soultion, taking a random number (in that Example, Value 1,2 or 3), generate the value within the range (either between min/max or min and the rest of the sum, depending on which is smaller), substracting that number of my given sum, and keep that going until everything is distributed. But that would be horrible inefficiant. I could really use a way where the runtime of the algorithm is fixed.
I'm trying to get that running in Java. But that Info is not that importend, except if someone already has a solution ready. All i need is a description or and idea of an algorithm.
First, note that the problem is equivalent to:
Generate k numbers that sums to a number y, such that x_1, ..., x_k -
each has a limit.
The second can be achieved by simply reducing the lower bound from the number - so in your example, it is equivalent to:
Generate 3 numbers such that x1 <= 2; x2 <= 3; x3 <= 4; x1+x2+x3 = 2
Note that the 2nd problem can be solved in various ways, one of them is:
Generate a list with h_i repeats per element - where h_i is the limit for element i - shuffle the list, and pick the first elements.
In your example, the list is:[x1,x1,x2,x2,x2,x3,x3,x3,x3] - shuffle it and choose first two elements.
(*) Note that shuffling the list can be done using fisher-yates algorithm. (you can abort the algorithm in the middle after you passed the desired limit).
Add up the minimum values. In this case 1 + 5 + 3 = 9
11 - 9 = 2, so you have to distribute 2 between the three numbers (eg: +2,+0,+0 or +0,+1,+1).
I leave the rest for you, it's relatively easy to create a uniform distribution after this transformation.
This problem is equivalent to randomly distributing an excess of 2 over the minimum of 9 on 3 positions.
So you start with the minima (1/5/3) and then cycle 2 times, generating a (pseudo-)random value of [0-2] (3 positions) and increment the indexed value.
e.g.
Start 1/5/3
1st random=1 ... increment index 1 ... 1/6/3
2nd random=0 ... increment index 0 ... 2/6/3
2+6+3=11
Edit
Reading this a second time, I understand, this is exactly what #KarolyHorvath mentioned.

How to multiply two big big numbers

You are given a list of n numbers L=<a_1, a_2,...a_n>. Each of them is
either 0 or of the form +/- 2k, 0 <= k <= 30. Describe and implement an
algorithm that returns the largest product of a CONTINUOUS SUBLIST
p=a_i*a_i+1*...*a_j, 1 <= i <= j <= n.
For example, for the input <8 0 -4 -2 0 1> it should return 8 (either 8
or (-4)*(-2)).
You can use any standard programming language and can assume that
the list is given in any standard data structure, e.g. int[],
vector<int>, List<Integer>, etc.
What is the computational complexity of your algorithm?
In my first answer I addressed the OP's problem in "multiplying two big big numbers". As it turns out, this wish is only a small part of a much bigger problem which I'm going to address now:
"I still haven't arrived at the final skeleton of my algorithm I wonder if you could help me with this."
(See the question for the problem description)
All I'm going to do is explain the approach Amnon proposed in little more detail, so all the credit should go to him.
You have to find the largest product of a continuous sublist from a list of integers which are powers of 2. The idea is to:
Compute the product of every continuous sublist.
Return the biggest of all these products.
You can represent a sublist by its start and end index. For start=0 there are n-1 possible values for end, namely 0..n-1. This generates all sublists that start at index 0. In the next iteration, You increment start by 1 and repeat the process (this time, there are n-2 possible values for end). This way You generate all possible sublists.
Now, for each of these sublists, You have to compute the product of its elements - that is come up with a method computeProduct(List wholeList, int startIndex, int endIndex). You can either use the built in BigInteger class (which should be able to handle the input provided by Your assignment) to save You from further trouble or try to implement a more efficient way of multiplication as described by others. (I would start with the simpler approach since it's easier to see if Your algorithm works correctly and first then try to optimize it.)
Now that You're able to iterate over all sublists and compute the product of their elements, determining the sublist with the maximum product should be the easiest part.
If it's still to hard for You to make the connections between two steps, let us know - but please also provide us with a draft of Your code as You work on the problem so that we don't end up incrementally constructing the solution and You copy&pasting it.
edit: Algorithm skeleton
public BigInteger listingSublist(BigInteger[] biArray)
{
int start = 0;
int end = biArray.length-1;
BigInteger maximum;
for (int i = start; i <= end; i++)
{
for (int j = i; j <= end; j++)
{
//insert logic to determine the maximum product.
computeProduct(biArray, i, j);
}
}
return maximum;
}
public BigInteger computeProduct(BigInteger[] wholeList, int startIndex,
int endIndex)
{
//insert logic here to return
//wholeList[startIndex].multiply(wholeList[startIndex+1]).mul...(
// wholeList[endIndex]);
}
Since k <= 30, any integer i = 2k will fit into a Java int. However the product of such two integers might not necessarily fit into a Java int since 2k * 2k = 22*k <= 260 which fill into a Java long. This should answer Your question regarding the "(multiplication of) two numbers...".
In case that You might want to multiply more than two numbers, which is implied by Your assignment saying "...largest product of a CONTINUOUS SUBLIST..." (a sublist's length could be > 2), have a look at Java's BigInteger class.
Actually, the most efficient way of multiplication is doing addition instead. In this special case all you have is numbers that are powers of two, and you can get the product of a sublist by simply adding the expontents together (and counting the negative numbers in your product, and making it a negative number in case of odd negatives).
Of course, to store the result you may need the BigInteger, if you run out of bits. Or depending on how the output should look like, just say (+/-)2^N, where N is the sum of the exponents.
Parsing the input could be a matter of switch-case, since you only have 30 numbers to take care of. Plus the negatives.
That's the boring part. The interesting part is how you get the sublist that produces the largest number. You can take the dumb approach, by checking every single variation, but that would be an O(N^2) algorithm in the worst case (IIRC). Which is really not very good for longer inputs.
What can you do? I'd probably start from the largest non-negative number in the list as a sublist, and grow the sublist to get as many non-negative numbers in each direction as I can. Then, having all the positives in reach, proceed with pairs of negatives on both sides, eg. only grow if you can grow on both sides of the list. If you cannot grow in both directions, try one direction with two (four, six, etc. so even) consecutive negative numbers. If you cannot grow even in this way, stop.
Well, I don't know if this alogrithm even works, but if it (or something similar) does, its an O(N) algorithm, which means great performance. Lets try it out! :-)
Hmmm.. since they're all powers of 2, you can just add the exponent instead of multiplying the numbers (equivalent to taking the logarithm of the product). For example, 2^3 * 2^7 is 2^(7+3)=2^10.
I'll leave handling the sign as an exercise to the reader.
Regarding the sublist problem, there are less than n^2 pairs of (begin,end) indices. You can check them all, or try a dynamic programming solution.
EDIT: I adjusted the algorithm outline to match the actual pseudo code and put the complexity analysis directly into the answer:
Outline of algorithm
Go seqentially over the sequence and store value and first/last index of the product (positive) since the last 0. Do the same for another product (negative) which only consists of the numbers since the first sign change of the sequence. If you hit a negative sequence element swap the two products (positive and negative) along with the associagted starting indices. Whenever the positive product hits a new maximum store it and the associated start and end indices. After going over the whole sequence the result is stored in the maximum variables.
To avoid overflow calculate in binary logarithms and an additional sign.
Pseudo code
maxProduct = 0
maxProductStartIndex = -1
maxProductEndIndex = -1
sequence.push_front( 0 ) // reuses variable intitialization of the case n == 0
for every index of sequence
n = sequence[index]
if n == 0
posProduct = 0
negProduct = 0
posProductStartIndex = index+1
negProductStartIndex = -1
else
if n < 0
swap( posProduct, negProduct )
swap( posProductStartIndex, negProductStartIndex )
if -1 == posProductStartIndex // start second sequence on sign change
posProductStartIndex = index
end if
n = -n;
end if
logN = log2(n) // as indicated all arithmetic is done on the logarithms
posProduct += logN
if -1 < negProductStartIndex // start the second product as soon as the sign changes first
negProduct += logN
end if
if maxProduct < posProduct // update current best solution
maxProduct = posProduct
maxProductStartIndex = posProductStartIndex
maxProductEndIndex = index
end if
end if
end for
// output solution
print "The maximum product is " 2^maxProduct "."
print "It is reached by multiplying the numbers from sequence index "
print maxProductStartIndex " to sequence index " maxProductEndIndex
Complexity
The algorithm uses a single loop over the sequence so its O(n) times the complexity of the loop body. The most complicated operation of the body is log2. Ergo its O(n) times the complexity of log2. The log2 of a number of bounded size is O(1) so the resulting complexity is O(n) aka linear.
I'd like to combine Amnon's observation about multiplying powers of 2 with one of mine concerning sublists.
Lists are terminated hard by 0's. We can break the problem down into finding the biggest product in each sub-list, and then the maximum of that. (Others have mentioned this).
This is my 3rd revision of this writeup. But 3's the charm...
Approach
Given a list of non-0 numbers, (this is what took a lot of thinking) there are 3 sub-cases:
The list contains an even number of negative numbers (possibly 0). This is the trivial case, the optimum result is the product of all numbers, guaranteed to be positive.
The list contains an odd number of negative numbers, so the product of all numbers would be negative. To change the sign, it becomes necessary to sacrifice a subsequence containing a negative number. Two sub-cases:
a. sacrifice numbers from the left up to and including the leftmost negative; or
b. sacrifice numbers from the right up to and including the rightmost negative.
In either case, return the product of the remaining numbers. Having sacrificed exactly one negative number, the result is certain to be positive. Pick the winner of (a) and (b).
Implementation
The input needs to be split into subsequences delimited by 0. The list can be processed in place if a driver method is built to loop through it and pick out the beginnings and ends of non-0 sequences.
Doing the math in longs would only double the possible range. Converting to log2 makes arithmetic with large products easier. It prevents program failure on large sequences of large numbers. It would alternatively be possible to do all math in Bignums, but that would probably perform poorly.
Finally, the end result, still a log2 number, needs to be converted into printable form. Bignum comes in handy there. There's new BigInteger("2").pow(log); which will raise 2 to the power of log.
Complexity
This algorithm works sequentially through the sub-lists, only processing each one once. Within each sub-list, there's the annoying work of converting the input to log2 and the result back, but the effort is linear in the size of the list. In the worst case, the sum of much of the list is computed twice, but that's also linear complexity.
See this code. Here I implement exact factorial of a huge large number. I am just using integer array to make big numbers. Download the code from Planet Source Code.

Categories

Resources