Rabin-Karp Hashcode is too big

Rabin-Karp Hashcode is too big - java

How to deal with a big hashcode value in rolling hash Rabin-Karp algorithm ? I use modular arithmetic to avoid negative number, however there is a problem when the hashcode exceeds my modulo number (N = 83559671). I set my base number to be prime (the number to calculate hashcode) as well as the modulo number (really big), but it doesn't work with long string. Can anyone see the problem?
Here is my code.
public static void main(String [] args){
int P = 13; // base
long M = 83559671;
long iHash = 0;
String word = "abcbadccaaaabbbb";
int WINDOW = 9;
for(int i = 0; i < WINDOW; i++){
iHash = int_mod(int_mod(iHash*P, M) + word[i], M);
}
for(int i = WINDOW; i < word.length; i++){
iHash = int_mod(iHash - word[i-WINDOW] * get_pow(P, WINDOW-1, M), M);
iHash = int_mod(iHash * P, M);
iHash = int_mod(iHash + word[i], M);
}
}
public static long get_pow(int p, int t, long M){
long a = 1;
for(int i = 0 ; i < t; i++){
a = int_mod(a * p, M);
}
return a;
}
public static long int_mod(long a, long b){
return (a % b+ b) % b;
}
The problem is when I have any string's length longer than 8 then the hashcode of the string exceeds the modulo number 83559671, and that leads to a wrong answer when I make a comparison. Any shorter strings work properly.

You don't need to do the modulus at all. Here's a demo:
public class Foo {
private static int hash(String s) {
int hash = 0;
for (int i = 0; i < s.length(); i++) {
hash *= 31;
hash += s.charAt(i);
}
return hash;
}
public static void main(String[] args) {
String s1 = "abcdefghij";
String s2 = s1.substring(1) + "k";
int pow = 1;
for (int i = 0; i < s1.length(); i++) {
pow *= 31;
}
System.out.printf("hash(%s) = %d%n", s1, hash(s1));
System.out.printf("hash(%s) = %d%n31 * hash(%s) - (31^%d * %s) + %s = %s%n",
s2,
hash(s2),
s1,
s1.length(),
s1.charAt(0),
s2.charAt(s2.length() - 1),
31 * hash(s1) - (pow * s1.charAt(0)) + s2.charAt(s2.length() - 1));
}
}
This (correctly) prints out:
hash(abcdefghij) = -634317659
hash(bcdefghijk) = 21611845
31 * hash(abcdefghij) - (31^10 * a) + k = 21611845

Why don't you treat your string as a polynomial? Suppose you have a string S of length n. Now take a look at the following function: F(x) = S[0]*x^(n-1) + S[1]*x^(n-2) + ... + S[i]*x^(n-i-1) + ... + S[n - 2]*x + S[n-1]. What happens if you try to compute F(P), where P is a base from your code snippet? Well, you'd get exactly the Rabin-Karp hash of string S. But since F(x) is a polynomial, we can use Horner's rule to compute the F(P). The resulting value might be very big, hence we use modular arithmetic:
static final long M = 83559671;
static final int Base = 13;
static long hash(String s, int from, int to) {
int iHash = 0;
for(int i = from; i < to; i++) {
iHash *= Base;
iHash += s.charAt(i);
iHash %= M;
}
return iHash;
}
You can use this function to obtain the hash of a string to be found in a text. And for initial window in the text. Then you can shift window and recalculate hash:
static void find(String pattern, String text) {
if(text.length() < pattern.length()) return;
int len = pattern.length();
long ph = hash(pattern, 0, len);
long h = hash(text, 0, len);
long basePower = mpow(Base, len);
if(h == ph) System.out.println("match at 0");
for(int i = len; i < text.length(); i++) {
h *= Base;
h += text.charAt(i);
h -= basePower * text.charAt(i - len);
h = mod(h);
if(h == ph) System.out.println("match at " + (i - len + 1));
}
}
static long mod(long a) {
a %= M;
if(a < 0) {
a += M;
}
return a;
}
static long mpow(long x, int k) {
long result = 1;
for(; k > 0; k >>= 1) {
if(k % 2 == 1) {
result = mod(result * x);
}
x = mod(x * x);
}
return result;
}
public static void main(String[] args) {
find("abracadabra", "abracadabracadabra");
}
For more information on this approach I recommend to refer to CLRS.

Related

Interview Programming question?(Long bit representation)?

Given a long number n, we need to return a long number which we get from the concatenation of binary representation from 1 to n?
Example say n=3, then answer would be 27, as 1 in binary is 01, 2 is 10 and 3 is 11 so concatenation would be 011011 which is 27.
this is the approach I used,
class Solution {
static String[] arr;
public static long binaryArray(long A) {
String res = "";
for (long i = 1; i <= A; i++) {
res += toBinary(i);
}
long rest = toLong(res);
return rest % 1000000007;
}
static long toLong(String s) {
int a = s.length();
int pow = 0;
long res = 0;
for (int i = a - 1; i >= 0; i--) {
char aa = s.charAt(i);
long f = Character.getNumericValue(aa);
long power = (long) Math.pow(2, pow);
res += power * f;
pow++;
}
return res;
}
static String toBinary(long a) {
if (a == 0) {
return "0";
}
String binary = "";
binary = Long.toBinaryString(a);
return binary;
}
public static void main(String args[]) {
long n = 89900;
arr = new String[(int) n + 1];
arr[0] = "0";
long startTime = System.nanoTime();
long b = binaryArray(n);
long endTime = System.nanoTime();
long totalTime = endTime - startTime;
long convert = TimeUnit.SECONDS.convert(totalTime, TimeUnit.NANOSECONDS);
System.out.println(convert);
System.out.println(b);
}
}
but it is not completing the request in the required time.
Is there a quicker way?

public static long f(int n) {
long n2 = ((long)n) << n;
return n2 | n;
}
There is much irrelevant text in the specification: binary representation, concatenation and such. What is said:
the bits of n should be "concatenated" n bits to the left = bit shift.
So this interview question was intentionally misleading, and the actual solution simple. It tried and succeeded to let the interviewee start with bit tests and such.
Now I am somewhat experienced and the trick is always to look at the whole picture,
think in notions like sets, or Integer's bit functions.
As #kaya3 commented, the following might be more correct.
public static long f(int n) {
long n2 = 0;
for (int i = 0; i < n; ++i) {
if ((n & (1 << i)) == 1) { // i'th bit set?
int bits = 32 - Integer.numberOfLeadingZeros(i);
n2 <<= bits;
n2 |= i;
}
}
return n2;
}

Bloom Filters: Getting higher error rates than expected

I created a bloom filter using murmur3, blake2b, and Kirsch-Mitzenmacher-optimization, as described in the second answer to this question: Which hash functions to use in a Bloom filter
However, when I was testing it, the bloom filter constantly had a much higher error rate than I was expecting.
Here is the code I used to generate the bloom filters:
public class BloomFilter {
private BitSet filter;
private int size;
private int hfNum;
private int prime;
private double fp = 232000; //One false positive every fp items
public BloomFilter(int count) {
size = (int)Math.ceil(Math.ceil(((double)-count) * Math.log(1/fp))/(Math.pow(Math.log(2),2)));
hfNum = (int)Math.ceil(((this.size / count) * Math.log(2)));
//size = (int)Math.ceil((hfNum * count) / Math.log(2.0));
filter = new BitSet(size);
System.out.println("Initialized filter with " + size + " positions and " + hfNum + " hash functions.");
}
public BloomFilter extraSecure(int count) {
return new BloomFilter(count, true);
}
private BloomFilter(int count, boolean x) {
size = (int)Math.ceil((((double)-count) * Math.log(1/fp))/(Math.pow(Math.log(2),2)));
hfNum = (int)Math.ceil(((this.size / count) * Math.log(2)));
prime = findPrime();
size = prime * hfNum;
filter = new BitSet(prime * hfNum);
System.out.println("Initialized filter with " + size + " positions and " + hfNum + " hash functions.");
}
public void add(String in) {
filter.set(getMurmur(in), true);
filter.set(getBlake(in), true);
if(this.hfNum > 2) {
for(int i = 3; i <= (hfNum); i++) {
filter.set(getHash(in, i));
}
}
}
public boolean check(String in) {
if(!filter.get(getMurmur(in)) || !filter.get(getBlake(in))) {
return false;
}
for(int i = 3; i <= hfNum; i++) {
if(!filter.get(getHash(in, i))) {
return false;
}
}
return true;
}
private int getMurmur(String in) {
int temp = murmur(in) % (size);
if(temp < 0) {
temp = temp * -1;
}
return temp;
}
private int getBlake(String in) {
int temp = new BigInteger(blake256(in), 16).intValue() % (size);
if(temp < 0) {
temp = temp * -1;
}
return temp;
}
private int getHash(String in, int i) {
int temp = ((getMurmur(in)) + (i * getBlake(in))) % size;
return temp;
}
private int findPrime() {
int temp;
int test = size;
while((test * hfNum) > size ) {
temp = test - 1;
while(!isPrime(temp)) {
temp--;
}
test = temp;
}
if((test * hfNum) < this.size) {
test++;
while(!isPrime(test)) {
test++;
}
}
return test;
}
private static boolean isPrime(int num) {
if (num < 2) return false;
if (num == 2) return true;
if (num % 2 == 0) return false;
for (int i = 3; i * i <= num; i += 2)
if (num % i == 0) return false;
return true;
}
#Override
public String toString() {
final StringBuilder buffer = new StringBuilder(size);
IntStream.range(0, size).mapToObj(i -> filter.get(i) ? '1' : '0').forEach(buffer::append);
return buffer.toString();
}
}
Here is the code I'm using to test it:
public static void main(String[] args) throws Exception {
int z = 0;
int times = 10;
while(z < times) {
z++;
System.out.print("\r");
System.out.print(z);
BloomFilter test = new BloomFilter(4000);
SecureRandom random = SecureRandom.getInstance("SHA1PRNG");
for(int i = 0; i < 4000; i++) {
test.add(blake256(Integer.toString(random.nextInt())));
}
int temp = 0;
int count = 1;
while(!test.check(blake512(Integer.toString(temp)))) {
temp = random.nextInt();
count++;
}
if(z == (times)) {
Files.write(Paths.get("counts.txt"), (Integer.toString(count)).getBytes(), StandardOpenOption.APPEND);
}else {
Files.write(Paths.get("counts.txt"), (Integer.toString(count) + ",").getBytes(), StandardOpenOption.APPEND);
}
if(z == 1) {
Files.write(Paths.get("counts.txt"), (Integer.toString(count) + ",").getBytes());
}
}
}
I expect to get a value relatively close to the fp variable in the bloom filter class, but instead I frequently get half that. Anyone know what I'm doing wrong, or if this is normal?
EDIT: To show what I mean by high error rates, when I run the code on a filter initialized with count 4000 and fp 232000, this was the output in terms of how many numbers the filter had to run through before it found a false positive:
158852,354114,48563,76875,156033,82506,61294,2529,82008,32624
This was generated using the extraSecure() method for initialization, and repeated 10 times to generate these 10 numbers; all but one of them took less than 232000 generated values to find a false positive. The average of the 10 is about 105540, and that's common no matter how many times I repeat this test.
Looking at the values it found, the fact that it found a false positive after only generating 2529 numbers is a huge issue for me, considering I'm adding 4000 data points.

I'm afraid I don't know where the bug is, but you can simplify a lot. You don't actually need prime size, you don't need SecureRandom, BigInteger, and modulo. All you need is a good 64 bit hash (seeded if possible, for example murmur):
long bits = (long) (entryCount * bitsPerKey);
int arraySize = (int) ((bits + 63) / 64);
long[] data = new long[arraySize];
int k = getBestK(bitsPerKey);
void add(long key) {
long hash = hash64(key, seed);
int a = (int) (hash >>> 32);
int b = (int) hash;
for (int i = 0; i < k; i++) {
data[reduce(a, arraySize)] |= 1L << index;
a += b;
}
}
boolean mayContain(long key) {
long hash = hash64(key, seed);
int a = (int) (hash >>> 32);
int b = (int) hash;
for (int i = 0; i < k; i++) {
if ((data[reduce(a, arraySize)] & 1L << a) == 0) {
return false;
}
a += b;
}
return true;
}
static int reduce(int hash, int n) {
// http://lemire.me/blog/2016/06/27/a-fast-alternative-to-the-modulo-reduction/
return (int) (((hash & 0xffffffffL) * n) >>> 32);
}
static int getBestK(double bitsPerKey) {
return Math.max(1, (int) Math.round(bitsPerKey * Math.log(2)));
}

Turns out the issue was that the answer on the other page wasn't completely correct, and neither was the comment below it.
The comment said:
in the paper hash_i = hash1 + i x hash2 % p, where p is a prime, hash1 and hash2 is within range of [0, p-1], and the bitset consists k * p bits.
However, looking at the paper reveals that while all the hashes are mod p, each hash function is assigned a subset of the total bitset, which I understood to mean hash1 mod p would determine a value for indices 0 through p, hash2 mod p would determine a value for indices p through 2*p, and so on and so forth until the k value chosen for the bitset is reached.
I'm not 100% sure if adding this will fix my code, but it's worth a try. I'll update this if it works.
UPDATE: Didn't help. I'm looking into what else may be causing this problem.

Generating vampire numbers in free range

I am writing program which generates Vampire numbers https://en.wikipedia.org/wiki/Vampire_number.
I have main function with numberOfDigits argument, which must be even. If numberOfDigits is equal 4, then we are searching Vampire Numbers in range 1000 to 9999 - four digits. If numberOfDigits is equal 6, then we are searching Vampire Numbers from 100000 to 999999 - which is six digits.
In following file, when I want to search Vampire numbers in range of 10 digits, Java heap space is screaming. Note that I have default settings for memory. But for, numberOfDigits == 4, 6 or 8, code is working correctly. (compared output to https://oeis.org/A014575/b014575.txt , https://oeis.org/A014575 ). So I want to ask,
What I can do to optimize this code? I have thought about using String with digits inside, instead of long/BigInteger. I want to "omit" that heap problem. Saving big numbers to file would be too slow, am I right?
My mate wrote (bigNum.cpp) http://pastebin.com/0HHdE848 - class in C++, to operate on big numbers. Maybe with help from community I could implement that in my a.java? More important - would it be useful for my problem?
edit: My goal is to generate free range of Vampire Numbers, like 4,6,8 - a.java it can do it, even more (if I can bypass heap space problem). And that is when my questions to help comes.
a.java (permutation code from johk95, https://stackoverflow.com/a/20906510 )
import java.util.ArrayList;
import java.util.Arrays;
/**
*
* #author re
*/
public class a {
/**
*
* #param numberOfDigits {int}
* #return ArrayList of Integer
*/
public ArrayList<Integer> vdf(int numberOfDigits) {
if ((numberOfDigits % 2) == 1) {
//or throw Exception of unrecognised format/variable?
System.out.println("cant operate on odd argument");
return new ArrayList<>();
}
long maxRange = 9;
for (int i = 1; i < numberOfDigits; i++) {
maxRange *= 10;
maxRange += 9;
}//numberOfDigits==4 then maxRange==9999, nOD==5 then maxRange==99999,..
long minRange = 1;
for (int i = 1; i < numberOfDigits; i++) {
minRange *= 10;
}//nOD==4 then minRange==1000, nOD==5 then minRange==10000, ..
ArrayList<Integer> ret = new ArrayList<>();
for (long i = minRange; i < maxRange; i++) {
long a = i;
long[] b = new long[numberOfDigits];
for (int j = numberOfDigits-1; j >= 0 ; j--) {
long c = a % 10;
a = a / 10;
b[j] = c;
}
int x = 0;
int y = 0;
ArrayList<long[]> list = permutations(b);
b = null; //dont need now
for(long[] s : list) {
for (int j = 0; j < numberOfDigits/2; j++) {
x += s[(numberOfDigits/2)-j-1] * Math.pow(10, j);
y += s[numberOfDigits-j-1] * Math.pow(10, j);
}
StringBuilder builder = new StringBuilder();
for (long t : s) {
builder.append(t);
}
String v = builder.toString();
if ((v.charAt((v.length()/2)-1) != '0'||
v.charAt(v.length()-1) != '0') &&
x * y == i) {
ret.add(x);
ret.add(y);
System.out.println(x*y+" "+x+" "+y);
break;
}
x = y = 0;
}
}
System.out.printf("%d vampire numbers found\n", ret.size()/2);
return ret;
}
/**
*
*#return vdf(4)
*/
public ArrayList<Integer> vdf() {
return vdf(4);//without trailing zeros
}
/* permutation code copied from
* johk95
* https://stackoverflow.com/a/20906510
*/
private static ArrayList<long[]> permutations(long[] lol) {
ArrayList<long[]> ret = new ArrayList<>();
permutation(lol, 0, ret);
return ret;
}
private static void permutation(long[] arr, int pos, ArrayList<long[]> list){
if(arr.length - pos == 1)
list.add(arr.clone());
else
for(int i = pos; i < arr.length; i++){
swap(arr, pos, i);
permutation(arr, pos+1, list);
swap(arr, pos, i);
}
}
private static void swap(long[] arr, int pos1, int pos2){
long h = arr[pos1];
arr[pos1] = arr[pos2];
arr[pos2] = h;
}
public static void main(String[] args) {
a a = new a();
try{
a.vdf(10); //TRY IT WITH 4, 6 or 8. <<<<
}catch (java.lang.OutOfMemoryError e){
System.err.println(e.getMessage());
}
}
}
EDIT: http://ideone.com/3rHhep - working code above with numberOfDigits == 4.

package testing;
import java.util.Arrays;
public class Testing
{
final static int START = 11, END = 1000;
public static void main(String[] args)
{
char[] kChar, checkChar;
String kStr, checkStr;
int k;
for(int i=START; i<END; i++) {
for(int i1=i; i1<100; i1++) {
k = i * i1;
kStr = Integer.toString(k);
checkStr = Integer.toString(i) + Integer.toString(i1);
//if(kStr.length() != 4) break;
kChar = kStr.toCharArray();
checkChar = checkStr.toCharArray();
Arrays.sort(kChar);
Arrays.sort(checkChar);
if(Arrays.equals(kChar, checkChar)) {
System.out.println(i + " * " + i1 + " = " + k);
}
}
}
}
}
This will generate vampire numbers, just modify the start and end integers.

Sum of Powers of two Integers using only For-Loops

The question here would be to get the sum of powers (m^0 + m^1 + m^2 + m^3.... + m^n) using only FOR loops. Meaning, not using any other loops as well as Math.pow();
Is it even possible? So far, I am only able to work around getting m^n, but not the rest.
public static void main(String[] args){
Scanner scn = new Scanner(System.in);
int total = 1;
System.out.print("Enter value of m: ");
int m = scn.nextInt();
System.out.print("Enter value of n: ");
int n = scn.nextInt();
for (int i = 1; i <= n; i++){
total * m;
}
System.out.print(total);
}
Let's say m =8; and n = 4;
i gives me '1,2,3,4' which is what I need, but I am unable to power m ^ i.
Would be nice if someone could guide me into how it could be done, can't seem to progress onwards as I have limited knowledge in Java.
Thanks in advance!

You might want to rewrite it like this :
m^0 + m^1 + m^2 + m^3.... + m^n = 1 + m * (1 + m * (1 + m * (.... ) ) )
And you do it in a single for loop.
This should do the job (see explanations in comments):
public long count(long m, int pow) {
long result = 1;
for(int i = 0;i<pow; i++) {
result*=m +1;
}
return result;
}

You can nest loops. Use one to compute the powers and another to sum them.

You can do below:
int mul = 1;
total = 1;
for(int i=1;i<=n;i++) {
mul *= m;
total += mul;
}
System.out.println(total);

You can use a single loop which is O(N) instead of nested loops which is O(N^2)
long total = 1, power = m
for (int i = 1; i <= n; i++){
total += power;
power *= m;
}
System.out.print(total);

You can also use the formula for geometric series:
Sum[i = k..k+n](a^i) = (a^k - a^(k+n+1)) / (1 - a)
= a^k * (1 - a^(n+1)) / (1 - a)
With this, the implementation can be done in a single for loop (or 2 simple for loop): either with O(n) simple looping, or with O(log n) exponentiation by squaring.
However, the drawback is that the data type must be able to hold at least (1 - a^(n+1)), while summing up normally only requires the result to fit in the data type.

This is the solution :
for(int i=0;i<n;i++){
temp=1;
for(int j=0;j<=i;j++){
temp *= m;
}
total += temp;
}
System.out.println(total+1);

You can easily calculate powers using your own pow function, something like:
private static long power(int a, int b) {
if (b < 0) {
throw new UnsupportedOperationException("Negative powers not supported.");
}
if (b == 0) {
return 1;
}
if (b == 1) {
return a;
}
return a * power(a, b - 1);
}
Then simply loop over all the values and add them up:
long out = 0;
for (int i = 0; i <= n; ++i) {
out += power(m, i);
}
System.out.println(out);
I would add that this is a classic dynamic programming problem as m^n is m * m^(n-1). I would therefore add caching of previously calculated powers so that you don't have to recalculate.
private static Map<Integer, Long> powers;
public static void main(String args[]) {
int m = 4;
int n = 4;
powers = new HashMap<>();
long out = 0;
for (int i = 0; i <= n; ++i) {
out += power(m, i);
}
System.out.println(out);
System.out.println(powers);
}
private static long power(int a, int b) {
if (b < 0) {
throw new UnsupportedOperationException("Negative powers not supported.");
}
if (b == 0) {
return 1;
}
if (b == 1) {
return a;
}
Long power = powers.get(b);
if (power == null) {
power = a * power(a, b - 1);
powers.put(b, power);
}
return power;
}
This caches calculated values so that you only calculate the next multiple each time.

Fastest algorithm to check if a number is pandigital?

Pandigital number is a number that contains the digits 1..number length.
For example 123, 4312 and 967412385.
I have solved many Project Euler problems, but the Pandigital problems always exceed the one minute rule.
This is my pandigital function:
private boolean isPandigital(int n){
Set<Character> set= new TreeSet<Character>();
String string = n+"";
for (char c:string.toCharArray()){
if (c=='0') return false;
set.add(c);
}
return set.size()==string.length();
}
Create your own function and test it with this method
int pans=0;
for (int i=123456789;i<=123987654;i++){
if (isPandigital(i)){
pans++;
}
}
Using this loop, you should get 720 pandigital numbers. My average time was 500 millisecond.
I'm using Java, but the question is open to any language.
UPDATE
#andras answer has the best time so far, but #Sani Huttunen answer inspired me to add a new algorithm, which gets almost the same time as #andras.

C#, 17ms, if you really want a check.
class Program
{
static bool IsPandigital(int n)
{
int digits = 0; int count = 0; int tmp;
for (; n > 0; n /= 10, ++count)
{
if ((tmp = digits) == (digits |= 1 << (n - ((n / 10) * 10) - 1)))
return false;
}
return digits == (1 << count) - 1;
}
static void Main()
{
int pans = 0;
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 123456789; i <= 123987654; i++)
{
if (IsPandigital(i))
{
pans++;
}
}
sw.Stop();
Console.WriteLine("{0}pcs, {1}ms", pans, sw.ElapsedMilliseconds);
Console.ReadKey();
}
}
For a check that is consistent with the Wikipedia definition in base 10:
const int min = 1023456789;
const int expected = 1023;
static bool IsPandigital(int n)
{
if (n >= min)
{
int digits = 0;
for (; n > 0; n /= 10)
{
digits |= 1 << (n - ((n / 10) * 10));
}
return digits == expected;
}
return false;
}
To enumerate numbers in the range you have given, generating permutations would suffice.
The following is not an answer to your question in the strict sense, since it does not implement a check. It uses a generic permutation implementation not optimized for this special case - it still generates the required 720 permutations in 13ms (line breaks might be messed up):
static partial class Permutation
{
/// <summary>
/// Generates permutations.
/// </summary>
/// <typeparam name="T">Type of items to permute.</typeparam>
/// <param name="items">Array of items. Will not be modified.</param>
/// <param name="comparer">Optional comparer to use.
/// If a <paramref name="comparer"/> is supplied,
/// permutations will be ordered according to the
/// <paramref name="comparer"/>
/// </param>
/// <returns>Permutations of input items.</returns>
public static IEnumerable<IEnumerable<T>> Permute<T>(T[] items, IComparer<T> comparer)
{
int length = items.Length;
IntPair[] transform = new IntPair[length];
if (comparer == null)
{
//No comparer. Start with an identity transform.
for (int i = 0; i < length; i++)
{
transform[i] = new IntPair(i, i);
};
}
else
{
//Figure out where we are in the sequence of all permutations
int[] initialorder = new int[length];
for (int i = 0; i < length; i++)
{
initialorder[i] = i;
}
Array.Sort(initialorder, delegate(int x, int y)
{
return comparer.Compare(items[x], items[y]);
});
for (int i = 0; i < length; i++)
{
transform[i] = new IntPair(initialorder[i], i);
}
//Handle duplicates
for (int i = 1; i < length; i++)
{
if (comparer.Compare(
items[transform[i - 1].Second],
items[transform[i].Second]) == 0)
{
transform[i].First = transform[i - 1].First;
}
}
}
yield return ApplyTransform(items, transform);
while (true)
{
//Ref: E. W. Dijkstra, A Discipline of Programming, Prentice-Hall, 1997
//Find the largest partition from the back that is in decreasing (non-icreasing) order
int decreasingpart = length - 2;
for (;decreasingpart >= 0 &&
transform[decreasingpart].First >= transform[decreasingpart + 1].First;
--decreasingpart) ;
//The whole sequence is in decreasing order, finished
if (decreasingpart < 0) yield break;
//Find the smallest element in the decreasing partition that is
//greater than (or equal to) the item in front of the decreasing partition
int greater = length - 1;
for (;greater > decreasingpart &&
transform[decreasingpart].First >= transform[greater].First;
greater--) ;
//Swap the two
Swap(ref transform[decreasingpart], ref transform[greater]);
//Reverse the decreasing partition
Array.Reverse(transform, decreasingpart + 1, length - decreasingpart - 1);
yield return ApplyTransform(items, transform);
}
}
#region Overloads
public static IEnumerable<IEnumerable<T>> Permute<T>(T[] items)
{
return Permute(items, null);
}
public static IEnumerable<IEnumerable<T>> Permute<T>(IEnumerable<T> items, IComparer<T> comparer)
{
List<T> list = new List<T>(items);
return Permute(list.ToArray(), comparer);
}
public static IEnumerable<IEnumerable<T>> Permute<T>(IEnumerable<T> items)
{
return Permute(items, null);
}
#endregion Overloads
#region Utility
public static IEnumerable<T> ApplyTransform<T>(
T[] items,
IntPair[] transform)
{
for (int i = 0; i < transform.Length; i++)
{
yield return items[transform[i].Second];
}
}
public static void Swap<T>(ref T x, ref T y)
{
T tmp = x;
x = y;
y = tmp;
}
public struct IntPair
{
public IntPair(int first, int second)
{
this.First = first;
this.Second = second;
}
public int First;
public int Second;
}
#endregion
}
class Program
{
static void Main()
{
int pans = 0;
int[] digits = new int[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
Stopwatch sw = new Stopwatch();
sw.Start();
foreach (var p in Permutation.Permute(digits))
{
pans++;
if (pans == 720) break;
}
sw.Stop();
Console.WriteLine("{0}pcs, {1}ms", pans, sw.ElapsedMilliseconds);
Console.ReadKey();
}
}

This is my solution:
static char[][] pandigits = new char[][]{
"1".toCharArray(),
"12".toCharArray(),
"123".toCharArray(),
"1234".toCharArray(),
"12345".toCharArray(),
"123456".toCharArray(),
"1234567".toCharArray(),
"12345678".toCharArray(),
"123456789".toCharArray(),
};
private static boolean isPandigital(int i)
{
char[] c = String.valueOf(i).toCharArray();
Arrays.sort(c);
return Arrays.equals(c, pandigits[c.length-1]);
}
Runs the loop in 0.3 seconds on my (rather slow) system.

Two things you can improve:
You don't need to use a set: you can use a boolean array with 10 elements
Instead of converting to a string, use division and the modulo operation (%) to extract the digits.

Using a bit vector to keep track of which digits have been found appears to be the fastest raw method. There are two ways to improve it:
Check if the number is divisible by 9. This is a necessary condition for being pandigital, so we can exclude 88% of numbers up front.
Use multiplication and shifts instead of divisions, in case your compiler doesn't do that for you.
This gives the following, which runs the test benchmark in about 3ms on my machine. It correctly identifies the 362880 9-digit pan-digital numbers between 100000000 and 999999999.
bool IsPandigital(int n)
{
if (n != 9 * (int)((0x1c71c71dL * n) >> 32))
return false;
int flags = 0;
while (n > 0) {
int q = (int)((0x1999999aL * n) >> 32);
flags |= 1 << (n - q * 10);
n = q;
}
return flags == 0x3fe;
}

My solution involves Sums and Products.
This is in C# and runs in about 180ms on my laptop:
static int[] sums = new int[] {1, 3, 6, 10, 15, 21, 28, 36, 45};
static int[] products = new int[] {1, 2, 6, 24, 120, 720, 5040, 40320, 362880};
static void Main(string[] args)
{
var pans = 0;
for (var i = 123456789; i <= 123987654; i++)
{
var num = i.ToString();
if (Sum(num) == sums[num.Length - 1] && Product(num) == products[num.Length - 1])
pans++;
}
Console.WriteLine(pans);
}
protected static int Sum(string num)
{
int sum = 0;
foreach (char c in num)
sum += (int) (c - '0');
return sum;
}
protected static int Product(string num)
{
int prod = 1;
foreach (char c in num)
prod *= (int)(c - '0');
return prod;
}

Why find when you can make them?
from itertools import *
def generate_pandigital(length):
return (''.join for each in list(permutations('123456789',length)))
def test():
for i in range(10):
print i
generate_pandigital(i)
if __name__=='__main__':
test()

J does this nicely:
isPandigital =: 3 : 0
*./ (' ' -.~ ": 1 + i. # s) e. s =. ": y
)
isPandigital"0 (123456789 + i. 1 + 123987654 - 123456789)
But slowly. I will revise. For now, clocking at 4.8 seconds.
EDIT:
If it's just between the two set numbers, 123456789 and 123987654, then this expression:
*./"1 (1+i.9) e."1 (9#10) #: (123456789 + i. 1 + 123987654 - 123456789)
Runs in 0.23 seconds. It's about as fast, brute-force style, as it gets in J.

TheMachineCharmer is right. At least for some the problems, it's better to iterate over all the pandigitals, checking each one to see if it fits the criteria of the problem. However, I think their code is not quite right.
I'm not sure which is better SO etiquette in this case: Posting a new answer or editing theirs. In any case, here is the modified Python code which I believe to be correct, although it doesn't generate 0-to-n pandigitals.
from itertools import *
def generate_pandigital(length):
'Generate all 1-to-length pandigitals'
return (''.join(each) for each in list(permutations('123456789'[:length])))
def test():
for i in range(10):
print 'Generating all %d-digit pandigitals' % i
for (n,p) in enumerate(generate_pandigital(i)):
print n,p
if __name__=='__main__':
test()

You could add:
if (set.add(c)==false) return false;
This would short circuit a lot of your computations, since it'll return false as soon as a duplicate was found, since add() returns false in this case.

bool IsPandigital (unsigned long n) {
if (n <= 987654321) {
hash_map<int, int> m;
unsigned long count = (unsigned long)(log((double)n)/log(10.0))+1;
while (n) {
++m[n%10];
n /= 10;
}
while (m[count]==1 && --count);
return !count;
}
return false;
}
bool IsPandigital2 (unsigned long d) {
// Avoid integer overflow below if this function is passed a very long number
if (d <= 987654321) {
unsigned long sum = 0;
unsigned long prod = 1;
unsigned long n = d;
unsigned long max = (log((double)n)/log(10.0))+1;
unsigned long max_sum = max*(max+1)/2;
unsigned long max_prod = 1;
while (n) {
sum += n % 10;
prod *= (n%10);
max_prod *= max;
--max;
n /= 10;
}
return (sum == max_sum) && (prod == max_prod);
}

I have a solution for generating Pandigital numbers using StringBuffers in Java. On my laptop, my code takes a total of 5ms to run. Of this only 1ms is required for generating the permutations using StringBuffers; the remaining 4ms are required for converting this StringBuffer to an int[].
#medopal: Can you check the time this code takes on your system?
public class GenPandigits
{
/**
* The prefix that must be appended to every number, like 123.
*/
int prefix;
/**
* Length in characters of the prefix.
*/
int plen;
/**
* The digit from which to start the permutations
*/
String beg;
/**
* The length of the required Pandigital numbers.
*/
int len;
/**
* #param prefix If there is no prefix then this must be null
* #param beg If there is no prefix then this must be "1"
* #param len Length of the required numbers (excluding the prefix)
*/
public GenPandigits(String prefix, String beg, int len)
{
if (prefix == null)
{
this.prefix = 0;
this.plen = 0;
}
else
{
this.prefix = Integer.parseInt(prefix);
this.plen = prefix.length();
}
this.beg = beg;
this.len = len;
}
public StringBuffer genPermsBet()
{
StringBuffer b = new StringBuffer(beg);
for(int k=2;k<=len;k++)
{
StringBuffer rs = new StringBuffer();
int l = b.length();
int s = l/(k-1);
String is = String.valueOf(k+plen);
for(int j=0;j<k;j++)
{
rs.append(b);
for(int i=0;i<s;i++)
{
rs.insert((l+s)*j+i*k+j, is);
}
}
b = rs;
}
return b;
}
public int[] getPandigits(String buffer)
{
int[] pd = new int[buffer.length()/len];
int c= prefix;
for(int i=0;i<len;i++)
c =c *10;
for(int i=0;i<pd.length;i++)
pd[i] = Integer.parseInt(buffer.substring(i*len, (i+1)*len))+c;
return pd;
}
public static void main(String[] args)
{
GenPandigits gp = new GenPandigits("123", "4", 6);
//GenPandigits gp = new GenPandigits(null, "1", 6);
long beg = System.currentTimeMillis();
StringBuffer pansstr = gp.genPermsBet();
long end = System.currentTimeMillis();
System.out.println("Time = " + (end - beg));
int pd[] = gp.getPandigits(pansstr.toString());
long end1 = System.currentTimeMillis();
System.out.println("Time = " + (end1 - end));
}
}
This code can also be used for generating all Pandigital numbers(excluding zero). Just change the object creation call to
GenPandigits gp = new GenPandigits(null, "1", 9);
This means that there is no prefix, and the permutations must start from "1" and continue till the length of the numbers is 9.
Following are the time measurements for different lengths.
#andras: Can you try and run your code to generate the nine digit Pandigital numbers? What time does it take?

This c# implementation is about 8% faster than #andras over the range 123456789 to 123987654 but it is really difficult to see on my test box as his runs in 14ms and this one runs in 13ms.
static bool IsPandigital(int n)
{
int count = 0;
int digits = 0;
int digit;
int bit;
do
{
digit = n % 10;
if (digit == 0)
{
return false;
}
bit = 1 << digit;
if (digits == (digits |= bit))
{
return false;
}
count++;
n /= 10;
} while (n > 0);
return (1<<count)-1 == digits>>1;
}
If we average the results of 100 runs we can get a decimal point.
public void Test()
{
int pans = 0;
var sw = new Stopwatch();
sw.Start();
for (int count = 0; count < 100; count++)
{
pans = 0;
for (int i = 123456789; i <= 123987654; i++)
{
if (IsPandigital(i))
{
pans++;
}
}
}
sw.Stop();
Console.WriteLine("{0}pcs, {1}ms", pans, sw.ElapsedMilliseconds / 100m);
}
#andras implementation averages 14.4ms and this implementation averages 13.2ms
EDIT:
It seems that mod (%) is expensive in c#. If we replace the use of the mod operator with a hand coded version then this implementation averages 11ms over 100 runs.
private static bool IsPandigital(int n)
{
int count = 0;
int digits = 0;
int digit;
int bit;
do
{
digit = n - ((n / 10) * 10);
if (digit == 0)
{
return false;
}
bit = 1 << digit;
if (digits == (digits |= bit))
{
return false;
}
count++;
n /= 10;
} while (n > 0);
return (1 << count) - 1 == digits >> 1;
}
EDIT: Integrated n/=10 into the digit calculation for a small speed improvement.
private static bool IsPandigital(int n)
{
int count = 0;
int digits = 0;
int digit;
int bit;
do
{
digit = n - ((n /= 10) * 10);
if (digit == 0)
{
return false;
}
bit = 1 << digit;
if (digits == (digits |= bit))
{
return false;
}
count++;
} while (n > 0);
return (1 << count) - 1 == digits >> 1;
}

#include <cstdio>
#include <ctime>
bool isPandigital(long num)
{
int arr [] = {1,2,3,4,5,6,7,8,9}, G, count = 9;
do
{
G = num%10;
if (arr[G-1])
--count;
arr[G-1] = 0;
} while (num/=10);
return (!count);
}
int main()
{
clock_t start(clock());
int pans=0;
for (int i = 123456789;i <= 123987654; ++i)
{
if (isPandigital(i))
++pans;
}
double end((double)(clock() - start));
printf("\n\tFound %d Pandigital numbers in %lf seconds\n\n", pans, end/CLOCKS_PER_SEC);
return 0;
}
Simple implementation. Brute-forced and computes in about 140 ms

In Java
You can always just generate them, and convert the Strings to Integers, which is faster for larger numbers
public static List<String> permutation(String str) {
List<String> permutations = new LinkedList<String>();
permutation("", str, permutations);
return permutations;
}
private static void permutation(String prefix, String str, List<String> permutations) {
int n = str.length();
if (n == 0) {
permutations.add(prefix);
} else {
for (int i = 0; i < n; i++) {
permutation(prefix + str.charAt(i),
str.substring(0, i) + str.substring(i + 1, n), permutations);
}
}
}
The below code works for testing a numbers pandigitality.
For your test mine ran in around ~50ms
1-9 PanDigital
public static boolean is1To9PanDigit(int i) {
if (i < 1e8) {
return false;
}
BitSet set = new BitSet();
while (i > 0) {
int mod = i % 10;
if (mod == 0 || set.get(mod)) {
return false;
}
set.set(mod);
i /= 10;
}
return true;
}
or more general, 1 to N,
public static boolean is1ToNPanDigit(int i, int n) {
BitSet set = new BitSet();
while (i > 0) {
int mod = i % 10;
if (mod == 0 || mod > n || set.get(mod)) {
return false;
}
set.set(mod);
i /= 10;
}
return set.cardinality() == n;
}
And just for fun, 0 to 9, zero requires extra logic due to a leading zero
public static boolean is0To9PanDigit(long i) {
if (i < 1e6) {
return false;
}
BitSet set = new BitSet();
if (i <= 123456789) { // count for leading zero
set.set(0);
}
while (i > 0) {
int mod = (int) (i % 10);
if (set.get(mod)) {
return false;
}
set.set(mod);
i /= 10;
}
return true;
}
Also for setting iteration bounds:
public static int maxPanDigit(int n) {
StringBuffer sb = new StringBuffer();
for(int i = n; i > 0; i--) {
sb.append(i);
}
return Integer.parseInt(sb.toString());
}
public static int minPanDigit(int n) {
StringBuffer sb = new StringBuffer();
for(int i = 1; i <= n; i++) {
sb.append(i);
}
return Integer.parseInt(sb.toString());
}
You could easily use this code to generate a generic MtoNPanDigital number checker

I decided to use something like this:
def is_pandigital(n, zero_full=True, base=10):
"""Returns True or False if the number n is pandigital.
This function returns True for formal pandigital numbers as well as
n-pandigital
"""
r, l = 0, 0
while n:
l, r, n = l + 1, r + n % base, n / base
t = xrange(zero_full ^ 1, l + (zero_full ^ 1))
return r == sum(t) and l == len(t)

Straight forward way
boolean isPandigital(int num,int length){
for(int i=1;i<=length;i++){
if(!(num+"").contains(i+""))
return false;
}
return true;
}
OR if you are sure that the number is of the right length already
static boolean isPandigital(int num){
for(int i=1;i<=(num+"").length();i++){
if(!(num+"").contains(i+""))
return false;
}
return true;
}

I refactored Andras' answer for Swift:
extension Int {
func isPandigital() -> Bool {
let requiredBitmask = 0b1111111111;
let minimumPandigitalNumber = 1023456789;
if self >= minimumPandigitalNumber {
var resultBitmask = 0b0;
var digits = self;
while digits != 0 {
let lastDigit = digits % 10;
let binaryCodedDigit = 1 << lastDigit;
resultBitmask |= binaryCodedDigit;
// remove last digit
digits /= 10;
}
return resultBitmask == requiredBitmask;
}
return false;
}
}
1023456789.isPandigital(); // true

great answers, my 2 cents
bool IsPandigital(long long number, int n){
int arr[] = { 1, 1, 1, 1, 1, 1, 1, 1, 1, 1 }, amax = 0, amin;
while (number > 0){
int rem = number % 10;
arr[rem]--;
if (arr[rem] < 0)
return false;
number = number / 10;
}
for (int i = 0; i < n; i++){
if (i == 0)
amin = arr[i];
if (arr[i] > amax)
amax = arr[i];
if (arr[i] < amin)
amin = arr[i];
}
if (amax == 0 && amin == 0)
return true;
else
return false;
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Rabin-Karp Hashcode is too big - java

Related

Interview Programming question?(Long bit representation)?

Bloom Filters: Getting higher error rates than expected

Generating vampire numbers in free range

Sum of Powers of two Integers using only For-Loops

Fastest algorithm to check if a number is pandigital?

Categories

Resources