Reduce Java BigInteger to fixed length smaller number - java

I'm attempting to implement a random number generator system; essentially I'm reading in an SHA1 hash, which then gets converted into a BigInteger value:
String start = "abc";
String hash = utils.SHA1(start); //Generates an SHA1 hash of the string
byte[] bytes = hash.getBytes();
BigInteger big = new BigInteger(bytes);
This code generates a BigInteger with a value of:
811203900027758629330492243480887228261034167773619203962320290854945165232584286910163772258660
What I need to somehow do (and this is where I get confused), is reduce that number into a much shorter number with a fixed number of decimal places.
Using a combination of modular arithmetic and Java Math API functions, is there a sensible way of reducing this number down into a 3 digit number. Or any other length of number I choose.
At the moment I'm just simply converting that huge number into a String, and then taking a substring of the length of number I want. However I'm not entirely happy with this as the numbers I get aren't that random, as the range is somewhat limited with 3 digits.
The whole purpose of this is for the newly generated random x digit number to be then converted into a string using a radix of 36, to also include ASCII alphabet characters.
Any information or advice would be greatly appreciated.
Thanks!!

Yes you can use modulus like .mod(1000) or for base 36 .mod(36*36*36) or even plain .longValue() % 1000 or .longValue() % (36*36*36)
You can use Long.toString(x, 10) or Long.toString(x, 36)
Not sure I can tell much more without giving you the answer.

Related

Creating a BigInteger from String and pad "0"s to it

[InputParam1:Decimal Number in String format(for Eg:30),
InputParam2:Integer to denote number of repeating 0's to append(for Eg:6)]
For converting a number from Decimal to Binary and pad digits to its front I need to perform the following steps:
Step1: BigInteger binary=new BigInteger(InputParam1,2);
-->This works when I define a BigInt with a base 16 (which I just tried when base2 failed) but not with a base 2 like above.
It throws a numberformat exception.
Step 2: String pad=StringUtils.repeat("0",InputParam2);
(to repeat 0 'InputParam2' number of times)
-->This works fine
Step 3: Need to append pad in front of binary(from Step1)
For a BigInteger, I'm not able to get something similar to .append.
So I'm trying to a)convert this BigInteger number to String and b)append pad c)convert back to BigInteger (this step I'm not able to get without creating a new BigInteger)
Any pointers on Step1 and Step3-c would be helpful please.
#1 - You haven't said what value you're passing in InputParam1, but my guess is that you're passing in something other than a string containing only '0' in '1' characters. The 'radix' parameter to the constructor tells the code how to interpret the string you're giving it. This works fine for me:
BigInteger binary=new BigInteger("1000",2);
System.out.println(binary);
How you construct the number has nothing to do with how you're eventually going to want to represent or display that number. The key idea here is that the representation of a number (hex, decimal, binary) has nothing to do with the number itself, just like leading zeros don't affect the value of number. 1111 binary, f hex, and 15 decimal are all the same number. If you pass the three of these into BigDecimal's constructor with a radix of 2, 16 and 10 respectively, you'll end up with EXACTLY the same thing in each case. The object will lose any notion of what kind of representation you used to set it to its initial value.
#3 - There is no concept of a number (int, BigInteger, etc.) with zero padding at the front, just as there's no notion of which base/radix you might use to represent a number visually. You can think about it conceptually, but that's just a way of displaying the number. It has nothing to do with the number itself.
I hadn't tried it before, but it seems there's no super simple way to format a binary value in Java with leading 0s, like there is for decimal and hex, since String.format() doesn't give a format specifier for binary. Per this StackOverflow post How to get 0-padded binary representation of an integer in java?, this seems to be about the best way to go, having converted the most accepted answer to work with BigInteger:
String str = String.format("%16s", binary.toString(2)).replace(' ', '0');
So here's all of my sample code, with output:
BigInteger binary=new BigInteger("1000",2);
System.out.println(binary);
String str = String.format("%16s", binary.toString(2)).replace(' ', '0');
System.out.println(str);
Output:
8
0000000000001000

Seemingly easy FNV1 hashing implementation results in a lot of collisions

I'm playing with hash tables and using a corpus of ~350,000 English words which I'd like to try to evenly distribute. Thus, I try to fit them into an array of length 810,049 (the closest prime larger than two times the input size) and I was baffled to see that a straightforward FNV1 implementation like this:
public int getHash(String s, int mod) {
final BigInteger MOD = new BigInteger(Integer.toString(mod));
final BigInteger FNV_offset_basis = new BigInteger("14695981039346656037");
final BigInteger FNV_prime = new BigInteger("1099511628211");
BigInteger hash = new BigInteger(FNV_offset_basis.toString());
for (int i = 0; i < s.length(); i++) {
int charValue = s.charAt(i);
hash = hash.multiply(FNV_prime).mod(MOD);
hash = hash.xor(BigInteger.valueOf((int) charValue & 0xffff)).mod(MOD);
}
return hash.mod(MOD).intValue();
}
results in 64,000 collisions which is a lot, 20% of the input basically. What's wrong with my implementation? Is the approach somehow flawed?
EDIT: to add to that, I've also tried and implemented other hashing algorithms like sdbm and djb2 and they all perform just the same, equally poorly. All have these ~65k collisions on this corpus. When I changed the corpus to just 350,000 integers represented as strings, a bit of variance starts to occur (like one algorithms has 20,000 collisions and the other has 40,000) but still the number of collision is astoundingly high. Why?
EDIT2: I've just tested it and the Java's built-in .hashCode() results in equally as many collisions and even if you do something ridiculously naive, like a hash being a product of multiplicating charcodes of all the characters modulo 810,049, it performs only half worse than all those notorious algorithms (60k collisions vs. 90k with the naive approach).
Since mod is a parameter to your hash function I presume it is the range into which you want the hash normalized, i.e. for your specific use case you are expecting it to be 810,049. I assume this because:
The algorithm calls for the calculations to be done modulo 2n where n is the number of bits in the desired hash.
Given that the offset basis and FNV Prime are constants within the module, and are equal to the parameters for a 64-bit hash, the value of mod should also be fixed at 264.
Since it is not, I assume it is the desired final output range.
In other words, given a fixed offset basis and FNV Prime, there is no reason to pass in the mod parameter -- it is dictated by the other two FNV parameters.
If all the above is correct then the implementation is wrong. You should be doing the calculations mod 264 and applying a final remainder operation with 810,049.
Also (but this may not be important), the algorithm calls for xoring the lower 8 bits with an ASCII character, whereas you are xoring with 16 bits. I am not sure this will make a difference since for ASCII the high-order byte will be zero anyway and it will behave exactly as if you were xoring only 8 bits.

Reducing the number of bits in UUID

I have a use-case for getting distributed unique sequence numbers in integer format. UUID comes out to be the best and simple solution for me.
However, I need to have integers only, so I will convert that big hexadecimal number (UUID) to decimal number. UUID has 128 bits and hence will produce a decimal number of 39 digits.
I can't afford to have 39 digits number due to some strict database constraints. So, I get back to basics and try to convert the number to binary first and then decimal. Now, the standard process of converting a hexadecimal directly to binary is to take each hexadecimal digit and convert it into 4 bits. Each hexadecimal number can be converted to a set of 4 bits. Hence, for 32 hex digits in UUID, we get 128 bits (32*4) .
Now, I am thinking of not to follow the rule of converting each hexadecimal digit to 4 bits. Instead I will just use enough bits to represent that digit.
For example , take 12B as one hexadecimal number.
By standard process, conversion to binary comes out to be 0000-0001-0010-1011 (9 bits actually).
By my custom process, it comes out to be 1-10-1011 (7 bits actually).
So, by this method, number of bits got reduced. Now if bits reduced, the digits in the converted decimal number will get reduced and can live with my constraints.
Can you please help in validating my theory? Does this approach has some problem? Will this cause collision ? Is the method correct and can I go ahead with it?
Thanks in advance.
Yes, this will cause collisions.
e.g.
0000-0001-0010-1011 -> 1101011
0000-0000-0110-1011 -> 1101011
Sometime ago I spend couple of days debugging problems with UUID collisions (UUIDS were trimmed), debuging these things is a nightmare. You won't have a good time.
What you need is just to implement your own unique identifier shema --- depending on your use case developing such schema could be either very easy or very hard. You could for example assign each machine an unique number (lets say two bytes) and each machine would assing IDS serialy from 4 byte namespace. And in 6 bytes you have a nice UUID-like schema (with some constraints).

Random int generation from existing int

Consider the following int;
int start = 287729472784;
From that int, I need to create a new int that is only three digits in length, I can use any of the values from 0-9.
However, in order to create the new int, I cannot use any form of already existing random number generators.
I was wondering if it possible to use a combination of modular, xor, and, bit-shift- operations to somehow reduce the number down. Such as xor the last digit with the one before it, but I'm not sure if that is even possible.
Basically I need to create a three digit long int from the starting int, ideally reducing the starting int down to three digits in length.
I hope that makes sense and I'd appreciate any input.
Thanks
Not sure to understand your need but if your only wish is to generate a 3 digits number from another number maybe that the modulo function could help you :
var startNumber = 287729472784;
var modifiedNumber = startNumber % 1000;
If you wish a pseudo-randomn modifiedNumber that changes for each generation you can use time in miliseconds :
var startNumber = 287729472784;
var modifiedNumber = startNumber * new Date().getTime() % 1000;
I hope it'll help.
vaL
Hm. I don't understand the problem, but... start % 1000 would yield the least significant 3 digits of start (though: be careful with negative values)?
The best answer really depends on the use of that final number. Since SHA1's are reasonably "random" to start with, using % 1000 should suffice -- you'll get a good spread over the range of all possible SHA1 inputs, if all you're looking for is a hash into a table.
However, if you're looking for a transform where the 3 digit number has little or no relationship (meaning, not just a modulo ...) to the input, you'll need some way to bang all the bits into the result. If that's the case, I'd suggest a transform such as CRC16. Feed the SHA1 value into your favorite CRC16 routine, then return the modulo 1000 value of that, keeping in mind that some results will show up more often than others.

How to assign the largest n bit unsigned integer to a BigInteger in Java

I have a scenario where I'm working with large integers (e.g. 160 bit), and am trying to create the biggest possible unsigned integer that can be represented with an n bit number at run time. The exact value of n isn't known until the program has begun executing and read the value from a configuration file. So for example, n might be 160, or 128, or 192, etcetera...
Initially what I was thinking was something like:
BigInteger.valueOf((long)Math.pow(2, n));
but then I realized, the conversion to long that takes place sort of defeats the purpose, given that long is not comprised of enough bits in the first place to store the result. Any suggestions?
On the largest n-bit unsigned number
Let's first take a look at what this number is, mathematically.
In an unsigned binary representation, the largest n-bit number would have all bits set to 1. Let's take a look at some examples:
1(2)= 1 =21 - 1
11(2)= 3 =22 - 1
111(2)= 7 =23 - 1
:
1………1(2)=2n -1
   n
Note that this is analogous in decimal too. The largest 3 digit number is:
103- 1 = 1000 - 1 = 999
Thus, a subproblem of finding the largest n-bit unsigned number is computing 2n.
On computing powers of 2
Modern digital computers can compute powers of two efficiently, due to the following pattern:
20= 1(2)
21= 10(2)
22= 100(2)
23= 1000(2)
:
2n= 10………0(2)
       n
That is, 2n is simply a number having its bit n set to 1, and everything else set to 0 (remember that bits are numbered with zero-based indexing).
Solution
Putting the above together, we get this simple solution using BigInteger for our problem:
final int N = 5;
BigInteger twoToN = BigInteger.ZERO.setBit(N);
BigInteger maxNbits = twoToN.subtract(BigInteger.ONE);
System.out.println(maxNbits); // 31
If we were using long instead, then we can write something like this:
// for 64-bit signed long version, N < 64
System.out.println(
(1L << N) - 1
); // 31
There is no "set bit n" operation defined for long, so traditionally bit shifting is used instead. In fact, a BigInteger analog of this shifting technique is also possible:
System.out.println(
BigInteger.ONE.shiftLeft(N).subtract(BigInteger.ONE)
); // 31
See also
Wikipedia/Binary numeral system
Bit Twiddling Hacks
Additional BigInteger tips
BigInteger does have a pow method to compute non-negative power of any arbitrary number. If you're working in a modular ring, there are also modPow and modInverse.
You can individually setBit, flipBit or just testBit. You can get the overall bitCount, perform bitwise and with another BigInteger, and shiftLeft/shiftRight, etc.
As bonus, you can also compute the gcd or check if the number isProbablePrime.
ALWAYS remember that BigInteger, like String, is immutable. You can't invoke a method on an instance, and expect that instance to be modified. Instead, always assign the result returned by the method to your variables.
Just to clarify you want the largest n bit number (ie, the one will all n-bits set). If so, the following will do that for you:
BigInteger largestNBitInteger = BigInteger.ZERO.setBit(n).subtract(BigInteger.ONE);
Which is mathematically equivalent to 2^n - 1. Your question has how you do 2^n which is actually the smallest n+1 bit number. You can of course do that with:
BigInteger smallestNPlusOneBitInteger = BigInteger.ZERO.setBit(n);
I think there is pow method directly in BigInteger. You can use it for your purpose
The quickest way I can think of doing this is by using the constructor for BigInteger that takes a byte[].
BigInteger(byte[] val) constructs the BigInteger Object from an array of bytes. You are, however, dealing with bits, and so creating a byte[] that might consist of {127, 255, 255, 255, 255} for a 39 bit integer representing 2^40 - 1 might be a little tedious.
You could also use the constructor BigInteger(String val, int radix) - which might be readily more apparently what's going on in your code if you don't mind a performance hit for parsing a String. Then you could generate a string like val = "111111111111111111111111111111111111111" and then call BigInteger myInt = new BigInteger(val, 2); - resulting in the same 39 bit integer.
The first option will require some thinking about how to represent your number. That particular constructor expects a two's-compliment, big-endian representation of the number. The second will likely be marginally slower, but much clearer.
EDIT: Corrected numbers. I thought you meant represent 2^n, and didn't correctly read the largest value n bits could store.

Categories

Resources