Dynamic programming with large inputs - java

I am trying to solve a classic Knapsack problem with huge capacity of 30.000.000 and it works well up until 20.000.000 but then it runs out of memory:
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
I have tried to divide all values and capacity by 1.000.000 but that generates floats and I don't think that is the correct approach. I have also tried to make the arrays and matrix of type long but that does not help.
Perhaps another data-structure?
Any pointers welcome...
Code:
public class Knapsack {
public static void main(String[] args) {
int N = Integer.parseInt(args[0]); // number of items
int W = Integer.parseInt(args[1]); // maximum weight of knapsack
int[] profit = new int[N+1];
int[] weight = new int[N+1];
// generate random instance, items 1..N
for (int n = 1; n <= N; n++) {
profit[n] = (int) (Math.random() * 1000000);
weight[n] = (int) (Math.random() * W);
}
// opt[n][w] = max profit of packing items 1..n with weight limit w
// sol[n][w] = does opt solution to pack items 1..n with weight limit w include item n?
int[][] opt = new int[N+1][W+1];
boolean[][] sol = new boolean[N+1][W+1];
for (int n = 1; n <= N; n++) {
for (int w = 1; w <= W; w++) {
// don't take item n
int option1 = opt[n-1][w];
// take item n
int option2 = Integer.MIN_VALUE;
if (weight[n] <= w) option2 = profit[n] + opt[n-1][w-weight[n]];
// select better of two options
opt[n][w] = Math.max(option1, option2);
sol[n][w] = (option2 > option1);
}
}
// determine which items to take
boolean[] take = new boolean[N+1];
for (int n = N, w = W; n > 0; n--) {
if (sol[n][w]) { take[n] = true; w = w - weight[n]; }
else { take[n] = false; }
}
// print results
System.out.println("item" + "\t" + "profit" + "\t" + "weight" + "\t" + "take");
for (int n = 1; n <= N; n++) {
System.out.println(n + "\t" + profit[n] + "\t" + weight[n] + "\t" + take[n]);
}
//Copyright © 2000–2011, Robert Sedgewick and Kevin Wayne. Last updated: Wed Feb 9 //09:20:16 EST 2011.
}

Here are a couple of tricks I've used for things like that that.
First, a variant of a sparse matrix. It's not really sparse, but instead of assuming that "non-stored entries" are zero, you assume they're the same as the entry before. This can work in either direction (in the direction of the capacity or in the direction of the items), afaik not (easily) in both directions at the same time. Good trick, but doesn't defeat instances that are huge in both directions.
Secondly, a combination of Dynamic Programming and Branch & Bound. First, use DP with only the "last two rows". That gives you the value of the optimal solution. Then use Branch & Bound to find the subset of items that corresponds to the optimal solution. Sort by value/weight, apply the relaxation value[next_item] * (capacity_left / weight[next_item]) to bound with. Knowing the optimal value ahead of time makes pruning very effective.
The "last two rows" refers to the "previous row" (a slice of the tableau that has the solutions for all items up to i) and the "current row" (that you're filling right now). it could look something like this, for example: (this is C# btw, but should be easy to port)
int[] row0 = new int[capacity + 1], row1 = new int[capacity + 1];
for (int i = 0; i < weights.Length; i++)
{
for (int j = 0; j < row1.Length; j++)
{
int value_without_this_item = row1[j];
if (j >= weights[i])
row0[j] = Math.Max(value_without_this_item,
row1[j - weights[i]] + values[i]);
else
row0[j] = value_without_this_item;
}
// swap rows
int[] t = row1;
row1 = row0;
row0 = t;
}
int optimal_value = row1[capacity];

Use a recursive method to solve the problem. see http://penguin.ewu.edu/~trolfe/Knapsack01/Knapsack01.html for further information.
Hope it will be of help.

Break your for loops down into method calls.
This will have the effect of making the local variables GC'able once the method itself has completed.
So instead of nested for loops within the same main method call a method with the same functionality, which then calls a second method and you are effectively breaking the code up into small packets of local variables which can be collected when out of scope.

Related

Divide an Array in equal size, such that value of given function is minimum

I've came across the following problem statement.
You have a list of natural numbers of size N and you must distribute the values in two lists A and B of size N/2, so that the squared sum of A elements is the nearest possible to the multiplication of the B elements.
Example:
Consider the list 7 11 1 9 10 3 5 13 9 12.
The optimized distribution is:
List A: 5 9 9 12 13
List B: 1 3 7 10 11
which leads to the difference abs( (5+9+9+12+13)^2 - (1*3*7*10*11) ) = 6
Your program should therefore output 6, which is the minimum difference that can be achieved.
What I've tried:
I've tried Greedy approach in order to solve this problem. I took two variables sum and mul. Now I started taking elements from the given set one by one and tried adding it in both the variables and calculated current
square of sum and multiplication. Now finalize the element in one of the two sets, such that the combination gives minimum possible value.
But this approach is not working in the given example itselt. I can't figure out what approach could be used here.
I'm not asking for exact code for the solution. Any possible approach and the reason why it is working, would be fine.
EDIT:
Source: CodinGame, Community puzzle
Try out this:
import java.util.Arrays;
public class Test {
public static void main(String [] args){
int [] arr = {7, 11, 1, 9, 10, 3, 5, 13, 9, 12};
int [][] res = combinations(5, arr);
int N = Arrays.stream(arr).reduce(1, (a, b) -> a * b);
int min = Integer.MAX_VALUE;
int [] opt = new int [5];
for (int [] i : res){
int k = (int) Math.abs( Math.pow(Arrays.stream(i).sum(), 2) - N/(Arrays.stream(i).reduce(1, (a, b) -> a * b)));
if(k < min){
min = k;
opt = i;
}
}
Arrays.sort(opt);
System.out.println("minimum difference is "+ min + " with the subset containing this elements " + Arrays.toString(opt));
}
// returns all k-sized subsets of a n-sized set
public static int[][] combinations(int k, int[] set) {
int c = (int) binomial(set.length, k);
int[][] res = new int[c][Math.max(0, k)];
int[] ind = k < 0 ? null : new int[k];
for (int i = 0; i < k; ++i) {
ind[i] = i;
}
for (int i = 0; i < c; ++i) {
for (int j = 0; j < k; ++j) {
res[i][j] = set[ind[j]];
}
int x = ind.length - 1;
boolean loop;
do {
loop = false;
ind[x] = ind[x] + 1;
if (ind[x] > set.length - (k - x)) {
--x;
loop = x >= 0;
} else {
for (int x1 = x + 1; x1 < ind.length; ++x1) {
ind[x1] = ind[x1 - 1] + 1;
}
}
} while (loop);
}
return res;
}
// returns n choose k;
// there are n choose k combinations without repetition and without observance of the sequence
//
private static long binomial(int n, int k) {
if (k < 0 || k > n) return 0;
if (k > n - k) {
k = n - k;
}
long c = 1;
for (int i = 1; i < k+1; ++i) {
c = c * (n - (k - i));
c = c / i;
}
return c;
}
}
Code taken from this stackoverflow answer, also take a look at this wikipedia article about Combinations.
I am not sure if there is any exact solution in polynomial time. But you could try a simulated annealing based approach.
My approach would be:
Initialize listA and listB to a random state
With probability p run greedy step, otherwise run a random step
Keep track of the state and corresponding error (with a HashMap)
Greedy step: Find one element you can move between the list that optimizes the error.
Random Step: Pick a random element from either of these two sets and calculate the error. If the error is better, keep it. Otherwise with probability of q keep it.
At either of these two steps make sure that the new state is not already explored (or at least discourage it).
Set p to a small value (<0.1) and q could depend on the error difference.

How to randomly combine elements of 2 arrays while making sure to not reuse an element until all have been used at least once?

Essentially I'm writing a program that produces random poems out of an array of nouns and an array of adjectives.
This is accomplished basically using this line
String poem = adjectives[rand.nextInt(3)]+" "+ nouns[rand.nextInt(3)];
Simple enough, but I'm supposed to make sure that it doesn't reuse the same noun or adjective for the next poems until all of them have been used at least once already. I'm not sure how to do that.
Convert the arrays to list, so you can use Collections.shuffle to shuffle them. Once shuffled, you can then simply iterate over them. The values will be random order, and all words will be used exactly once. When you reach the end of an array of words, sort it again, and start from the beginning.
If a poem consists of 1 adjective + 1 noun as in your example, then the program could go something like this:
List<String> adjectives = new ArrayList<>(Arrays.asList(adjectivesArr));
List<String> nouns = new ArrayList<>(Arrays.asList(nounsArr));
Collections.shuffle(adjectives);
Collections.shuffle(nouns);
int aindex = 0;
int nindex = 0;
for (int i = 0; i < 100; ++i) {
String poem = adjectives.get(aindex++) + " " + nouns.get(nindex++);
System.out.println(poem);
if (aindex == adjectives.size()) {
aindex = 0;
Collections.shuffle(adjectives);
}
if (nindex == nouns.size()) {
nindex = 0;
Collections.shuffle(nouns);
}
}
The program will work with other number of adjectives and nouns per poem too.
If you must use an array, you can implement your own shuffle method, for example using the Fisher-Yates shuffle algorithm:
private void shuffle(String[] strings) {
Random random = new Random();
for (int i = strings.length - 1; i > 0; i--) {
int index = random.nextInt(i + 1);
String temp = strings[i];
strings[i] = strings[index];
strings[index] = temp;
}
}
And then rewrite with arrays in terms of this helper shuffle function:
shuffle(adjectives);
shuffle(nouns);
int aindex = 0;
int nindex = 0;
for (int i = 0; i < 100; ++i) {
String poem = adjectives[aindex++] + " " + nouns[nindex++];
System.out.println(poem);
if (aindex == adjectives.length) {
aindex = 0;
shuffle(adjectives);
}
if (nindex == nouns.length) {
nindex = 0;
shuffle(nouns);
}
}
What you can do is make two more arrays, filled with boolean values, that correspond to the adjective and noun arrays. You can do something like this
boolean adjectiveUsed = new boolean[adjective.length];
boolean nounUsed = new boolean[noun.length];
int adjIndex, nounIndex;
By default all of the elements are initialized to false. You can then do this
adjIndex = rand.nextInt(3);
nounIndex = rand.nextInt(3);
while (adjectiveUsed[adjIndex])
adjIndex = rand.nextInt(3);
while (nounUsed[nounIndex]);
nounIndex = rand.nextInt(3);
Note, once all of the elements have been used, you must reset the boolean arrays to be filled with false again otherwise the while loops will run forever.
There are lots of good options for this. One is to just have a list of the words in random order that get used one by one and are then refreshed when empty.
private List<String> shuffledNouns = Collections.EMPTY_LIST;
private String getNoun() {
assert nouns.length > 0;
if (shuffledNouns.isEmpty()) {
shuffledNouns = new ArrayList<>(Arrays.asList(nouns));
Collections.shuffle(wordOrder);
}
return shuffledNouns.remove(0);
}
Best way to do this is to create a shuffled queue from each array, and then just start popping off the front of the queues to build your poems. Once the queues are empty you just generate new shuffled queues and start over. Here's a good shuffling algorithm:
https://en.wikipedia.org/wiki/Fisher–Yates_shuffle
How about keeping two lists for the adjectives and nouns? You can use Collections.shuffle() to order them randomly.
import java.util.*;
class PoemGen {
static List<String> nouns = Arrays.asList("ball", "foobar", "dog");
static List<String> adjectives = Arrays.asList("slippery", "undulating", "crunchy");
public static void main(String[] args) {
for (int i = 0; i < 3; i++) {
System.out.println(String.format("\nPoem %d", i));
generatePoem();
}
}
private static void generatePoem() {
Collections.shuffle(nouns);
Collections.shuffle(adjectives);
int nounIndex = nouns.size() - 1;
int adjectiveIndex = adjectives.size() - 1;
while (nounIndex >= 0 && adjectiveIndex >= 0) {
final String poem = adjectives.get(adjectiveIndex--)+" "+ nouns.get(nounIndex--);
System.out.println(poem);
}
}
}
Output:
Poem 0
crunchy dog
slippery ball
undulating foobar
Poem 1
undulating dog
crunchy ball
slippery foobar
Poem 2
slippery ball
crunchy dog
undulating foobar
Assuming you have the same number of noums and adjectives shuffle both arrays and then merge result. you can shuffle the arrays multiple times if you need (once you get to the end)
shuffleArray(adjectives);
shuffleArray(nouns);
for(int i=0;i<3;i++) {
String poem = adjectives[i] + " " + nouns[i];
}
A simple method to shuffle the arrays:
static void shuffleArray( String[] data) {
for (int i = data.length - 1; i > 0; i--) {
int index = rnd.nextInt(i + 1);
int aux = data[index];
data[index] = data[i];
data[i] = aux;
}
}
This might be overkill for this specific problem but it's an interesting alternative in my opinion:
You can use a linear congruential generator (LCG) to generate the random numbers instead of using rand.nextInt(3). An LCG gives you a pseudo-random sequence of numbers using this simple formula
nextNumber = (a * x + b) % m
Now comes the interesting part (which makes this work for your problem):
The Hull-Dobell-Theorem states that if your parameters a, b and m fit the following set of rules, the generator will generate every number between 0 and m-1 exactly once before repeating.
The conditions are:
m and the offset c are relatively prime
a - 1 is divisible by all prime factors of m
a - 1 is divisible by 4 if m is divisible by 4
This way you could generate your poems with exactly the same line of code as you currently have but instead just generate the array index with the LCG instead of rand.nextInt. This also means that this solution will give you the best performance, since there is no sorting, shuffling or searching involved.
Thanks for the responses everyone! This helped immeasurably. I am now officially traumatized by the sheer number of ways there are to solve even a simple problem.

Iteration run faster for "even teams" code

I am coding to create two "even teams" based on the players' scores (puntajes).
The algorithm runs through the array of players and compares the score of each one to get a minimum difference and then sorts players into two arrays, one for each team.
Here is my code:
if (listaDeJugadores.size() == 6)
//In this case I'm looking for a 6 player array, to create 3 vs 3 teams, but I'm looking to do until 22 (11 vs 11). Any ideas are welcomed.
{
int dif1 = Math.abs((listaDeJugadores.get(0).getPuntaje() + listaDeJugadores.get(1).getPuntaje() + listaDeJugadores.get(2).getPuntaje())
- (listaDeJugadores.get(3).getPuntaje() + listaDeJugadores.get(4).getPuntaje() + listaDeJugadores.get(5).getPuntaje()));
int jugador1 = 0;
int jugador2 = 1;
int jugador3 = 2;
int jugador4 = 3;
int jugador5 = 4;
int jugador6 = 5;
int a = 0;
int b = 0;
int c = 0;
//The two fors are to search the arrays. The iterador is to find the other three remaining positions to compare.
for (int cont2 = 1; cont2 < listaDeJugadores.size() - 1; cont2++) {
for (int cont3 = cont2 + 1; cont3 < listaDeJugadores.size(); cont3++) {
ArrayList<Integer> arr = new ArrayList<>();
int iterador[] = {0,1,2,3,4,5,6};
int j = 1;
for (int i=0;i<iterador.length;i++)
{
//I look for the missing players to compare from the 6 possible
if (cont2==iterador[i]|cont3==iterador[i])
{
j++;
}
else
{
c=b;
b=a;
a=j;
i--;
j++;
}
}
int dif = Math.abs((listaDeJugadores.get(0).getPuntaje() + listaDeJugadores.get(cont2).getPuntaje() + listaDeJugadores.get(cont3).getPuntaje())
- (listaDeJugadores.get(a).getPuntaje() + listaDeJugadores.get(b).getPuntaje() + listaDeJugadores.get(c).getPuntaje()));
if (dif < dif1) {
dif = dif1;
jugador1 = 0;
jugador2 = cont2;
jugador3 = cont3;
jugador4 = a;
jugador5 = b;
jugador6 = c;
}
}
}
//I add the best available sorted teams to EquipoBlanco or EquipoNegro.
listaEquipoBlanco.add(listaDeJugadores.get(jugador1));
listaEquipoBlanco.add(listaDeJugadores.get(jugador2));
listaEquipoBlanco.add(listaDeJugadores.get(jugador3));
listaEquipoNegro.add(listaDeJugadores.get(jugador4));
listaEquipoNegro.add(listaDeJugadores.get(jugador5));
listaEquipoNegro.add(listaDeJugadores.get(jugador6));
team1.setText("Equipo Blanco: " + (listaEquipoBlanco.get(0).getPuntaje() + listaEquipoBlanco.get(1).getPuntaje() + listaEquipoBlanco.get(2).getPuntaje()));
team2.setText("Equipo Negro: " + (listaEquipoNegro.get(0).getPuntaje() + listaEquipoNegro.get(1).getPuntaje() + listaEquipoNegro.get(2).getPuntaje()));
I think the code is ok, but when I try to run it, it won't open because it has really bad performance. I'm thinking I might have iterated to infinity or something similar, but also when I look at it and see fors inside of fors inside of fors I know something is wrong.
How can I make it run faster and have better performance?
A quick look at it and that inner for loop looks suspicious. I might be wrong without trying ( bad me), but it has a i--; in there and i is the loop index so of that happens all the time or often enough you will never exit that one.
That happens when this isn't true: cont2==iterador[i]|cont3==iterador[i] (bitwise or, should probably be logical or || by the way), Not sure that is guaranteed to be true at some point? Could go back and forth even perhaps. cont2 and contr3 doesn't change but i can change a little bit.
No protection for i to go below zero though either so could crash and burn (exception).

Improving the algorithm for removal of element

Problem
Given a string s and m queries. For each query delete the K-th occurrence of a character x.
For example:
abcdbcaab
5
2 a
1 c
1 d
3 b
2 a
Ans abbc
My approach
I am using BIT tree for update operation.
Code:
for (int i = 0; i < ss.length(); i++) {
char cc = ss.charAt(i);
freq[cc-97] += 1;
if (max < freq[cc-97]) max = freq[cc-97];
dp[cc-97][freq[cc-97]] = i; // Counting the Frequency
}
BIT = new int[27][ss.length()+1];
int[] ans = new int[ss.length()];
int q = in.nextInt();
for (int i = 0; i < q; i++) {
int rmv = in.nextInt();
char c = in.next().charAt(0);
int rr = rmv + value(rmv, BIT[c-97]); // Calculating the original Index Value
ans[dp[c-97][rr]] = Integer.MAX_VALUE;
update(rmv, 1, BIT[c-97], max); // Updating it
}
for (int i = 0; i < ss.length(); i++) {
if (ans[i] != Integer.MAX_VALUE) System.out.print(ss.charAt(i));
}
Time Complexity is O(M log N) where N is length of string ss.
Question
My solution gives me Time Limit Exceeded Error. How can I improve it?
public static void update(int i , int value , int[] arr , int xx){
while(i <= xx){
arr[i ]+= value;
i += (i&-i);
}
}
public static int value(int i , int[] arr){
int ans = 0;
while(i > 0){
ans += arr[i];
i -= (i &- i);
}
return ans ;
}
There are key operations not shown, and odds are that one of them (quite likely the update method) has a different cost than you think. Furthermore your stated complexity is guaranteed to be wrong because at some point you have to scan the string which is at minimum O(N).
But anyways the obviously right strategy here is to go through the queries, separate them by character, and then go through the queries in reverse order to figure out the initial positions of the characters to be suppressed. Then run through the string once, emitting characters only when it fits. This solution, if implemented well, should be doable in O(N + M log(M)).
The challenge is how to represent the deletions efficiently. I'm thinking of some sort of tree of relative offsets so that if you find that the first deletion was 3 a you can efficiently insert it into your tree and move every later deletion after that one. This is where the log(M) bit will be.

Edit Distance solution for Large Strings

I'm trying to solve the edit distance problem. the code I've been using is below.
public static int minDistance(String word1, String word2) {
int len1 = word1.length();
int len2 = word2.length();
// len1+1, len2+1, because finally return dp[len1][len2]
int[][] dp = new int[len1 + 1][len2 + 1];
for (int i = 0; i <= len1; i++) {
dp[i][0] = i;
}
for (int j = 0; j <= len2; j++) {
dp[0][j] = j;
}
//iterate though, and check last char
for (int i = 0; i < len1; i++) {
char c1 = word1.charAt(i);
for (int j = 0; j < len2; j++) {
char c2 = word2.charAt(j);
//if last two chars equal
if (c1 == c2) {
//update dp value for +1 length
dp[i + 1][j + 1] = dp[i][j];
} else {
int replace = dp[i][j] + 1 ;
int insert = dp[i][j + 1] + 1 ;
int delete = dp[i + 1][j] + 1 ;
int min = replace > insert ? insert : replace;
min = delete > min ? min : delete;
dp[i + 1][j + 1] = min;
}
}
}
return dp[len1][len2];
}
It's a DP approach. The problem it since it use a 2D array we cant solve this problem using above method for large strings. Ex: String length > 100000.
So Is there anyway to modify this algorithm to overcome that difficulty ?
NOTE:
The above code will accurately solve the Edit Distance problem for small strings. (which has length below 1000 or near)
As you can see in the code it uses a Java 2D Array "dp[][]" . So we can't initialize a 2D array for large rows and columns.
Ex : If i need to check 2 strings whose lengths are more than 100000
int[][] dp = new int[len1 + 1][len2 + 1];
the above will be
int[][] dp = new int[100000][100000];
So it will give a stackOverflow error.
So the above program only good for small length Strings.
What I'm asking is , Is there any way to solve this problem for large strings(length > 100000) efficiently in java.
First of all, there's no problem in allocating a 100k x 100k int array in Java, you just have to do it in the Heap, not the Stack (and on a machine with around 80GB of memory :))
Secondly, as a (very direct) hint:
Note that in your loop, you are only ever using 2 rows at a time - row i and row i+1. In fact, you calculate row i+1 from row i. Once you get i+1 you don't need to store row i anymore.
This neat trick allows you to store only 2 rows at the same time, bringing down the space complexity from n^2 to n. Since you stated that this is not homework (even though you're a CS undergrad by your profile...), I'll trust you to come up with the code yourself.
Come to think of it I recall having this exact problem when I was doing a class in my CS degree...

Categories

Resources