The problem is, given an input file with four billion unique integers, provide an algorithm to generate an integer which is not contained in the file, assume only have 10 MB of memory.
Searched for some solutions and posted code below, one of which is to store integers into bit-vector blocks (each block representing a specific range of integers among 4 billion range, each bit in the block represent for an integer), and using another counter for each block, to count the number of integers in each block. So that if number of integers is less than the block capacity for integers, scan the bit-vector of the block to find which are missing integers.
My question for this solution is, why "the nearer to the middle that we pick, the less memory will be used at any given time", here are more context,
The array in the first pass can fit in 10 megabytes, or roughly 2^23 bytes, of memory. Since each element in the array is an int, and an int is 4 bytes, we can hold an array of at most about 2^21 elements. So, we can deduce the following:
Therefore, we can conclude the following:
2^10< rangeSize <2^26, and these conditions give us a good amount of "wiggle room," but the nearer to the middle that we pick, the less memory will be used at any given time.
public class QuestionB {
public static int bitsize = 1048576; // 2^20 bits (2^17 bytes)
public static int blockNum = 4096; // 2^12
public static byte[] bitfield = new byte[bitsize/8];
public static int[] blocks = new int[blockNum];
public static void findOpenNumber() throws FileNotFoundException {
int starting = -1;
Scanner in = new Scanner (new FileReader ("Chapter 10/Question10_3/input_file_q10_3.txt"));
while (in.hasNextInt()) {
int n = in.nextInt();
blocks[n / (bitfield.length * 8)]++;
}
for (int i = 0; i < blocks.length; i++) {
if (blocks[i] < bitfield.length * 8){
/* if value < 2^20, then at least 1 number is missing in
* that section. */
starting = i * bitfield.length * 8;
break;
}
}
in = new Scanner(new FileReader("Chapter 10/Question10_3/input_file_q10_3.txt"));
while (in.hasNextInt()) {
int n = in.nextInt();
/* If the number is inside the block that’s missing
* numbers, we record it */
if (n >= starting && n < starting + bitfield.length * 8) {
bitfield [(n-starting) / 8] |= 1 << ((n - starting) % 8);
}
}
for (int i = 0 ; i < bitfield.length; i++) {
for (int j = 0; j < 8; j++) {
/* Retrieves the individual bits of each byte. When 0 bit
* is found, finds the corresponding value. */
if ((bitfield[i] & (1 << j)) == 0) {
System.out.println(i * 8 + j + starting);
return;
}
}
}
}
public static void main(String[] args) throws FileNotFoundException {
findOpenNumber();
}
}
If you form M blocks each of size 2^32/M, the total memory required is M+2^27/M words (32 bits). This function reaches a minimum when M=√2^27, which is halfway between 1 and 2^27 blocks. The minimum is 2^14.5 words, about 92 KBytes.
This is very clear on a bilogarithmic plot.
I like this question. I'll give it additional thought but I think if disk space and time is not an issue, you can break the numbers into 100k blocks, and sort them in each file. Any block that doesn't have 100k entries will have a gap. It's not elegant at all but it gets the ball rolling.
Related
I am trying to prepare for a contest but my program speed is always dreadfully slow as I use O(n). First of all, I don't even know how to make it O(log n), or I've never heard about this paradigm. Where can I learn about this?
For example,
If you had an integer array with zeroes and ones, such as [ 0, 0, 0, 1, 0, 1 ], and now you wanted to replace every 0 with 1 only if one of it's neighbors has the value of 1, what is the most efficient way to go about doing if this must occur t number of times? (The program must do this for a number of t times)
EDIT:
Here's my inefficient solution:
import java.util.Scanner;
public class Main {
static Scanner input = new Scanner(System.in);
public static void main(String[] args) {
int n;
long t;
n = input.nextInt();
t = input.nextLong();
input.nextLine();
int[] units = new int[n + 2];
String inputted = input.nextLine();
input.close();
for(int i = 1; i <= n; i++) {
units[i] = Integer.parseInt((""+inputted.charAt(i - 1)));
}
int[] original;
for(int j = 0; j <= t -1; j++) {
units[0] = units[n];
units[n + 1] = units[1];
original = units.clone();
for(int i = 1; i <= n; i++) {
if(((original[i - 1] == 0) && (original[i + 1] == 1)) || ((original[i - 1] == 1) && (original[i + 1] == 0))) {
units[i] = 1;
} else {
units[i] = 0;
}
}
}
for(int i = 1; i <= n; i++) {
System.out.print(units[i]);
}
}
}
This is an elementary cellular automaton. Such a dynamical system has properties that you can use for your advantages. In your case, for example, you can set to value 1 every cell at distance at most t from any initial value 1 (cone of light property). Then you may do something like:
get a 1 in the original sequence, say it is located at position p.
set to 1 every position from p-t to p+t.
You may then take as your advantage in the next step that you've already set position p-t to p+t... This can let you compute the final step t without computing intermediary steps (good factor of acceleration isn't it?).
You can also use some tricks as HashLife, see 1.
As I was saying in the comments, I'm fairly sure you can keep out the array and clone operations.
You can modify a StringBuilder in-place, so no need to convert back and forth between int[] and String.
For example, (note: This is on the order of an O(n) operation for all T <= N)
public static void main(String[] args) {
System.out.println(conway1d("0000001", 7, 1));
System.out.println(conway1d("01011", 5, 3));
}
private static String conway1d(CharSequence input, int N, long T) {
System.out.println("Generation 0: " + input);
StringBuilder sb = new StringBuilder(input); // Will update this for all generations
StringBuilder copy = new StringBuilder(); // store a copy to reference current generation
for (int gen = 1; gen <= T; gen++) {
// Copy over next generation string
copy.setLength(0);
copy.append(input);
for (int i = 0; i < N; i++) {
conwayUpdate(sb, copy, i, N);
}
input = sb.toString(); // next generation string
System.out.printf("Generation %d: %s\n", gen, input);
}
return input.toString();
}
private static void conwayUpdate(StringBuilder nextGen, final StringBuilder currentGen, int charPos, int N) {
int prev = (N + (charPos - 1)) % N;
int next = (charPos + 1) % N;
// **Exactly one** adjacent '1'
boolean adjacent = currentGen.charAt(prev) == '1' ^ currentGen.charAt(next) == '1';
nextGen.setCharAt(charPos, adjacent ? '1' : '0'); // set cell as alive or dead
}
For the two samples in the problem you posted in the comments, this code generates this output.
Generation 0: 0000001
Generation 1: 1000010
1000010
Generation 0: 01011
Generation 1: 00011
Generation 2: 10111
Generation 3: 10100
10100
The BigO notation is a simplification to understand the complexity of the Algorithm. Basically, two algorithms O(n) can have very different execution times. Why? Let's unroll your example:
You have two nested loops. The outer loop will run t times.
The inner loop will run n times
For each time the loop executes, it will take a constant k time.
So, in essence your algorithm is O(k * t * n). If t is in the same order of magnitude of n, then you can consider the complexity as O(k * n^2).
There is two approaches to optimize this algorithm:
Reduce the constant time k. For example, do not clone the whole array on each loop, because it is very time consuming (clone needs to do a full array loop to clone).
The second optimization in this case is to use Dynamic Programing (https://en.wikipedia.org/wiki/Dynamic_programming) that can cache information between two loops and optimize the execution, that can lower k or even lower the complexity from O(nˆ2) to O(n * log n).
I'm currently writing a random number generator that generates a pseudorandom int and then returns an int representing the least significant bit in the int value.
public class RNG {
//algorithm input: first input is the seed
private int x;
//primes for calculating next random number
private int p;
private int q;
//constructor
public RNG()
{
p = getPrime();
q = getPrime();
//make sure p and q are not equal
while(p == q)
{
q = getPrime();
}
//seed must not be divisible by p or q
//easiest way is to make the seed another prime number
//this prime does not need to be congruent to 3 mod 4, so
//getPrime() is not used.
x = (BigInteger.probablePrime(8, new Random())).intValue();
}
//return a pseudorandom prime number using java.util.Random such that
//the number is congruent to 3 mod 4
private int getPrime()
{
Random rand = new Random();
int i;
while(true)
{
//generates a random BigInteger. The probability
//this BigInteger is composite is less than 2^-100
//bitLength is 8 so maximum prime value is 251
i = (BigInteger.probablePrime(8, rand)).intValue();
if((i % 4) == (3 % 4))
{
break;
}
}
return i;
}
public int next()
{
//calculate next algorithm input
int next = (x * x) % (p * q);
//set algorithm input for next next() call
x = next;
//return least significant bit
return next & 1;
}
}
The aim of the program is to be able to create a return value that a test class can then write to a file, and because the output is the least significant bit of the randomly generated number the output will be cryptographically secure.
However, when trying to come up with the implementation of the test class, I'm having trouble trying to figure out what to do with the output. What I need the test class to be able to do is write 512 bits to an output file. The FileOuputStream class, however, is only capable of writing bytes to the output file.
What I'm trying to do is figure out if there's a way to group the individual 1 or 0 outputs from my random number generator into byte values, but so far I've not found anything that's supported in Java.
You can construct bytes (or ints, or longs) using bit shifting with the << operator (or the bit shift and assignment operator <<=)
byte b = 0;
for (int i = 0; i < 8; i++) {
b <<= 1;
b |= rng.next();
}
In order to explore some solutions, I need to generate all possibilities. I'm doing it by using bit masking, like this:
for (long i = 0; i < 1L << NB; i++) {
System.out.println(Long.toBinaryString(i));
if(checkSolution(i)) {
this.add(i); // add i to solutions
}
}
this.getBest(); // get the solution with lowest number of 1
this allow me to explore (if NB=3):
000
001
010
011
100
101
110
111
My problem is that the best solution is the one with the lowest number of 1.
So, in order to stop the search as soon as I found a solution, I would like to have a different order and produce something like this:
000
001
010
100
011
101
110
111
That would make the search a lot faster since I could stop as soon as I get the first solution. But I don't know how can I change my loop to get this output...
PS: NB is undefined...
The idea is to turn your loop into two nested loops; the outer loop sets the number of 1's, and the inner loop iterates through every combination of binary numbers with N 1's. Thus, your loop becomes:
for (long i = 1; i < (1L << NB); i = (i << 1) | 1) {
long j = i;
do {
System.out.println(Long.toBinaryString(j));
if(checkSolution(j)) {
this.add(j); // add j to solutions
}
j = next_of_n(j);
} while (j < (1L << NB));
}
next_of_n() is defined as:
long next_of_n(long j) {
long smallest, ripple, new_smallest, ones;
if (j == 0)
return j;
smallest = (j & -j);
ripple = j + smallest;
new_smallest = (ripple & -ripple);
ones = ((new_smallest / smallest) >> 1) - 1;
return (ripple | ones);
}
The algorithm behind next_of_n() is described in C: A Reference Manual, 5th edition, section 7.6, while showing an example of a SET implementation using bitwise operations. It may be a little hard to understand the code at first, but here's what the book says about it:
This code exploits many unusual properties of unsigned arithmetic. As
an illustration:
if x == 001011001111000, then
smallest == 000000000001000
ripple == 001011010000000
new_smallest == 000000010000000
ones == 000000000000111
the returned value == 001011010000111
The overall idea is that you find the rightmost contiguous group of
1-bits. Of that group, you slide the leftmost 1-bit to the left one
place, and slide all the others back to the extreme right. (This code
was adapted from HAKMEM.)
I can provide a deeper explanation if you still don't get it. Note that the algorithm assumes 2 complement, and that all arithmetic should ideally take place on unsigned integers, mainly because of the right shift operation. I'm not a huge Java guy, I tested this in C with unsigned long and it worked pretty well. I hope the same applies to Java, although there's no such thing as unsigned long in Java. As long as you use reasonable values for NB, there should be no problem.
This is an iterator that iterates bit patterns of the same cardinality.
/**
* Iterates all bit patterns containing the specified number of bits.
*
* See "Compute the lexicographically next bit permutation"
* http://graphics.stanford.edu/~seander/bithacks.html#NextBitPermutation
*
* #author OldCurmudgeon
*/
public class BitPattern implements Iterable<BigInteger> {
// Useful stuff.
private static final BigInteger ONE = BigInteger.ONE;
private static final BigInteger TWO = ONE.add(ONE);
// How many bits to work with.
private final int bits;
// Value to stop at. 2^max_bits.
private final BigInteger stop;
// Should we invert the output.
private final boolean not;
// All patterns of that many bits up to the specified number of bits - invberting if required.
public BitPattern(int bits, int max, boolean not) {
this.bits = bits;
this.stop = TWO.pow(max);
this.not = not;
}
// All patterns of that many bits up to the specified number of bits.
public BitPattern(int bits, int max) {
this(bits, max, false);
}
#Override
public Iterator<BigInteger> iterator() {
return new BitPatternIterator();
}
/*
* From the link:
*
* Suppose we have a pattern of N bits set to 1 in an integer and
* we want the next permutation of N 1 bits in a lexicographical sense.
*
* For example, if N is 3 and the bit pattern is 00010011, the next patterns would be
* 00010101, 00010110, 00011001,
* 00011010, 00011100, 00100011,
* and so forth.
*
* The following is a fast way to compute the next permutation.
*/
private class BitPatternIterator implements Iterator<BigInteger> {
// Next to deliver - initially 2^n - 1
BigInteger next = TWO.pow(bits).subtract(ONE);
// The last one we delivered.
BigInteger last;
#Override
public boolean hasNext() {
if (next == null) {
// Next one!
// t gets v's least significant 0 bits set to 1
// unsigned int t = v | (v - 1);
BigInteger t = last.or(last.subtract(BigInteger.ONE));
// Silly optimisation.
BigInteger notT = t.not();
// Next set to 1 the most significant bit to change,
// set to 0 the least significant ones, and add the necessary 1 bits.
// w = (t + 1) | (((~t & -~t) - 1) >> (__builtin_ctz(v) + 1));
// The __builtin_ctz(v) GNU C compiler intrinsic for x86 CPUs returns the number of trailing zeros.
next = t.add(ONE).or(notT.and(notT.negate()).subtract(ONE).shiftRight(last.getLowestSetBit() + 1));
if (next.compareTo(stop) >= 0) {
// Dont go there.
next = null;
}
}
return next != null;
}
#Override
public BigInteger next() {
last = hasNext() ? next : null;
next = null;
return not ? last.not(): last;
}
#Override
public void remove() {
throw new UnsupportedOperationException("Not supported.");
}
#Override
public String toString () {
return next != null ? next.toString(2) : last != null ? last.toString(2): "";
}
}
public static void main(String[] args) {
System.out.println("BitPattern(3, 10)");
for (BigInteger i : new BitPattern(3, 10)) {
System.out.println(i.toString(2));
}
}
}
First you loop over your number of ones, say n. First you start with 2^n-1, which is the first integer to contain exactly n ones and test it. To get the next one, you use the algorithm from Hamming weight based indexing (it's C code, but should not be to hard to translate it to java).
Here's some code I put together some time ago to do this. Use the combinadic method giving it the number of digits you want, the number of bits you want and which number in the sequence.
// n = digits, k = weight, m = position.
public static BigInteger combinadic(int n, int k, BigInteger m) {
BigInteger out = BigInteger.ZERO;
for (; n > 0; n--) {
BigInteger y = nChooseK(n - 1, k);
if (m.compareTo(y) >= 0) {
m = m.subtract(y);
out = out.setBit(n - 1);
k -= 1;
}
}
return out;
}
// Algorithm borrowed (and tweaked) from: http://stackoverflow.com/a/15302448/823393
public static BigInteger nChooseK(int n, int k) {
if (k > n) {
return BigInteger.ZERO;
}
if (k <= 0 || k == n) {
return BigInteger.ONE;
}
// ( n * ( nChooseK(n-1,k-1) ) ) / k;
return BigInteger.valueOf(n).multiply(nChooseK(n - 1, k - 1)).divide(BigInteger.valueOf(k));
}
public void test() {
System.out.println("Hello");
BigInteger m = BigInteger.ZERO;
for ( int i = 1; i < 10; i++ ) {
BigInteger c = combinadic(5, 2, m);
System.out.println("c["+m+"] = "+c.toString(2));
m = m.add(BigInteger.ONE);
}
}
Not sure how it matches up in efficiency with the other posts.
In a computer contest, I was given a problem where I had to manipulate input data. The input has been split() into an array where data[0] is the number of repetitions. There can be up to 10^18 repetitions. My program returned Exception in thread "main" java.lang.OutOfMemoryError: Java heap space and I failed the contest.
Here's a piece of my code that's eating up memory and CPU:
long product[][]=new long[data[0]][2];
product[0][0]=data[1];
product[0][1]=data[2];
for(int a=1;a<data[0];a++){
product[a][0]=((data[5]*product[a-1][0] + data[6]) % data[3]) + 1; // Pi = ((A*Pi-1 + B) mod M) + 1 (for all i = 2..N)
product[a][1]=((data[7]*product[a-1][1] + data[8]) % data[4]) + 1; // Wi = ((C*Wi-1 + D) mod K) + 1 (for all i = 2..N)
}
Here's some of the input data:
980046644627629799 9 123456 18 10000000 831918484 451864686 840000324 650000765
972766173386786486 123 1 10000000 10000000 590000001 680000000 610000001 970000002
299896237124947938 681206 164538 2280874 981991 416793690 904023823 813682336 774801135
My program can only work up to about 7 or 8 digits, then it takes minutes to run. With 18 digits, it crashed almost as soon as I clicked "Run" in Eclipse.
I'm curious as to how is it possible to manipulate that much data on a normal computer. Please let me know if my question is unclear or you need more information. Thanks!
You can't have, and don't need, an array of such a huge length. You just need to track the most recent 2values. E.g., just have product1 and product2.
Also, consider testing if either number is a NaN after each iteration. If so, throw an Exception and give the iteration number.
Because once you get a NaN they will all be NaN. Except you are using long, so scratch that. "Nevermind". :-)
long product[][]=new long[data[0]][2];
This is the only line in the code you pasted that allocates memory. You allocate an array whose length will be data[0] in length! As data grows, so does the array. What is the formula you're trying to apply here?
The first input data you provide :
980046644627629799
is already too large to even declare an array for. Try creating a single dimension array with that as its length and see what happens....
Are you sure you don't just want a 1 x 2 matrix that you accumulate over? Explain your intended algorithm clearly and we can help you with a more optimal solution.
Let's put the numbers into perspective.
Memory: One long takes 8 bytes. 1018 longs take 16,000,000 terabytes. Way too much.
Time: 10,000,000 operations ≈ 1 second. 1018 steps ≈ 30 centuries. Also way too much.
You can solve the memory problem by realising that you only need the most recent values at any time, and that the entire array is redundant:
long currentP = data[1];
long currentW = data[2];
for (int a = 1; a < data[0]; a++)
{
currentP = ((data[5] * currentP + data[6]) % data[3]) + 1;
currentW = ((data[7] * currentW + data[8]) % data[4]) + 1;
}
The time problem is a bit trickier to solve. Since modulus is used, you can observe that the numbers must enter a cycle at some point. Once you find the cycle, you can predict what the value will be after n iterations without having to do each iteration manually.
The simplest method for finding cycles is to keep track of whether or not you visited each element, and then go through until you encounter an element you've seen before. In this situation, the amount of memory required is proportional to M and K (data[3] and data[4]). If they are too large, a more space-efficient cycle detection algorithm must be used.
Here is an example which finds the value for P:
public static void main(String[] args)
{
// value = (A * prevValue + B) % M + 1
final long NOT_SEEN = -1; // the code used for values not visited before
long[] data = { 980046644627629799L, 9, 123456, 18, 10000000, 831918484, 451864686, 840000324, 650000765 };
long N = data[0]; // the number of iterations
long S = data[1]; // the initial value of the sequence
long M = data[3]; // the modulus divisor
long A = data[5]; // muliply by this
long B = data[6]; // add this
int max = (int) Math.max(M, S); // all the numbers (except first) must be less than or equal to M
long[] seenTime = new long[max + 1]; // whether or not a value was seen and how many iterations it took
// initialize the values of 'seenTime' to 'not seen'
for (int i = 0; i < seenTime.length; i++)
{
seenTime[i] = NOT_SEEN;
}
// find the cycle
long count = 0;
long cycleValue = S; // the current value in the series
while (seenTime[(int)cycleValue] == NOT_SEEN)
{
seenTime[(int)cycleValue] = count;
cycleValue = (A * cycleValue + B) % M + 1;
count++;
}
long cycleLength = count - seenTime[(int)cycleValue];
long cycleOffset = seenTime[(int)cycleValue];
long result;
if (N < cycleOffset)
{
// Special case: requested iteration occurs before the cycle starts
// Straightforward simulation
long value = S;
for (long i = 0; i < N; i++)
{
value = (A * value + B) % M + 1;
}
result = value;
}
else
{
// Normal case: requested iteration occurs inside the cycle
// Simulate just the relevant part of one cycle
long positionInCycle = (N - cycleOffset) % cycleLength;
long value = cycleValue;
for (long i = 0; i < positionInCycle; i++)
{
value = (A * value + B) % M + 1;
}
result = value;
}
System.out.println(result);
}
I am only giving you the solution because it looks like the contest is over. The important lesson to learn from this is that you should always check the bounds to see whether your solution is practical before you start coding it up.
I am working on a projecteuler problem, and I have run into an OutOfMemoryError.
And I don't understand why because my code was working beautifully (to my novice eyes at least :P).
Everything works just fine until the loop reaches 113383.
If someone could help me debug this it would be greatly appreciated because I don't understand at all why it is failing.
My Code
import java.util.Map;
import java.util.HashMap;
import java.util.Stack;
/*
* The following iterative sequence is defined for the set of positive integers:
n = n/2 (n is even)
n = 3n + 1 (n is odd)
Using the rule above and starting with 13, we generate the following sequence:
13 40 20 10 5 16 8 4 2 1
It can be seen that this sequence (starting at 13 and finishing at 1) contains
10 terms. Although it has not been proved yet (Collatz Problem), it is thought
that all starting numbers finish at 1. Which starting number, under one million,
produces the longest chain?
NOTE: Once the chain starts the terms are allowed to go above one million.
*/
public class Problem14 {
static Map<Integer, Integer> map = new HashMap<Integer, Integer>();
final static int NUMBER = 1000000;
public static void main(String[] args) {
long start = System.currentTimeMillis();
map.put(1, 1);
for (int loop = 2; loop < NUMBER; loop++) {
compute(loop);
//System.out.println(map);
System.out.println(loop);
}
long end = System.currentTimeMillis();
System.out.println("Time: " + ((end - start) / 1000.) + " seconds" );
}
private static void compute(int currentNumber) {
Stack<Integer> temp = new Stack<Integer>();
/**
* checks to see if current value is already
* part of the map. if it isn't, add to temporary
* stack. also if current value exceeds 1 million,
* don't check map or add to stack.
*/
while (!map.containsKey(currentNumber)){
temp.add(currentNumber);
currentNumber = realCompute(currentNumber);
while (currentNumber > NUMBER){
currentNumber = realCompute(currentNumber);
}
}
System.out.println(temp);
/**
* adds members of the temporary stack to the map
* based on when they were placed in the stack
*/
int initial = map.get(currentNumber) + 1;
int size = temp.size();
for (int loop = 1; loop <= size; loop++){
map.put(temp.pop(), initial);
initial++;
//key is the top of stack
//value is initially 1 greater than the current number that was
//found at the map, then incremented by 1 afterwards;
}
}
private static int realCompute(int currentNumber) {
if (currentNumber % 2 == 0) {
currentNumber /= 2;
} else {
currentNumber = (currentNumber * 3) + 1;
}
return currentNumber;
}
}
Seems to me it's just what the error says: you're running out of memory. Increase the heap size with -Xmx (eg java -Xmx512m).
If you want to reduce the memory footprint, take a look at GNU Trove for a more compact (and faster) implementation of primitive maps (instead of using HashMap<Integer,Integer>).
I have the exactly same problem as you had...
The problem actually lies with the "Stack temp", and change it to Long would be fine.
It is a simple overflow.
Which OutOfMemory error is it? Heap, PermGen..? Probably need to increase heap size or permgen size.
you should define HashMap like: map = new HashMap(NUMBER).
you know, "Entry[] table" is the map data structure,auto capacity.
the reason for inital capacity:1,not array copy. 2,run faster.