Is RandomStringUtils.randomAlphanumeric(30) a valid GUID strategy? - java

I need a random string generator that generates an alpha-numeric string to use as an unique key in a distributed system that is 30 characters or less. It cannot contain any special characters.
Will RandomStringUtils#randomAlphanumeric work for this?
The underlying implementation uses java.util.Random.
The set of unique keys will probably be less than 100 billion, and the system needs to be able to handle up to 1000 records per second.
How can I prove that this strategy has a low enough probability of collision to work as a primary key generator?

java.util.Random implements a LCG algorithm and its period is 2^48 numbers, so RandomStringUtils will be as good as this implementation and 100 billion of 30-character strings would require ~ 1% of 2^48 random elements.
Note that java.util.Random is not cryptographically secure, so given some GUIDs it is possible to infer the next one, so I'd use another implementation that uses a cryptographically secure random number generator (e.g. java.util.SecureRandom).

Why you don't want to use java.util.UUID class? It returns random UUID of 32bit characters String.
Example implementation:
import java.util.UUID;
public class GenerateUUID {
public static final void main(String... aArgs){
//generate random UUIDs
UUID idOne = UUID.randomUUID();
UUID idTwo = UUID.randomUUID();
log("UUID One: " + idOne);
log("UUID Two: " + idTwo);
}
private static void log(Object aObject){
System.out.println(String.valueOf(aObject));
}
}

Random is not unique. Using random number generation to get "unique" values runs into the Birthday Problem https://en.wikipedia.org/wiki/Birthday_attack. This turns the a 1 in 2^48 probability into a 1 in 2^24 probability, which you'll end up hitting quicker than you think. Use UUIDs; they're designed to be universally unique.
32 characters:
UUID.randomUUID().toString.replace("-","")
22 characters:
UUID uuid = UUID.randomUUID();
String uuidStr = Base64.encodeBase64URLSafeString(ByteBuffer.wrap(new byte[16])
.putLong(uuid.getMostSignificantBits())
.putLong(uuid.getLeastSignificantBits())
.array()).replace("=", "");

Related

How to generate Unique 36 digit (ONLY NUMBER) from Java UUID

Using UUID.randomUUID() is best when we need the universal unique number of 36 char as alfa numeric but can't see an authentic function to generate the 36 digits number (Unique).
String lUUID = String.format("%040d", new BigInteger(UUID.randomUUID().toString().replace("-", ""), 16));
The above code we can use to generate the unique number but it's giving 40 digits and no guarantee of being uniqueness.
You definitely can't guarantee uniqueness any more than UUID.randomUUID() can. Without synchronizing with, like, a server somewhere, you can't guarantee that; you can just make it improbable.
But to make it improbable as possible, just use the random constructor:
SecureRandom random = new SecureRandom()
BigInteger bigint;
do {
bigint = new BigInteger(120, random); // 2^120 > 10^36
} while (bigint.compareTo(BigInteger.TEN.pow(36)) >= 0);

Using secure random to generate a long number

I have seeded my secure random object with a long number. Now I want to extract another long number. But there is only a function called nextBytes(byte[] b) which gives a random byte[].
Is there any way to get a long number?
SecureRandom ranGen1 = new SecureRandom();
ranGen1.setSeed(1000);
SecureRandom ranGen2 = new SecureRandom();
ranGen2.setSeed(1000);
byte[] b1= new byte[3];
byte[] b2=new byte[3];
ranGen1.nextBytes(b1);
ranGen2.nextBytes(b2);
int a1=b1[0];
int a2=b1[1];
int a3=b1[2];
int c1=b2[0];
int c2=b2[1];
int c3=b2[2];
System.out.println(a1+", "+a2+", "+a3);//genearated by ranGen1
System.out.println(c1+", "+c2+", "+c3);//generated by ranGen2
System.out.println(ranGen1.nextLong());//genearated by ranGen1
System.out.println(ranGen2.nextLong());//generated by ranGen2
result:
4, -67, 69
4, -67, 69
-3292989024239613972 //this is using nextLong()
-3292989024239613972
The Output for Peter Lawrey's code:(Using secure random)
-7580880967916090810 -7580880967916090810
7364820596437092015 7364820596437092015
6152225453014145174 6152225453014145174
6933818190189005053 6933818190189005053
-2602185131584800869 -2602185131584800869
-4964993377763884762 -4964993377763884762
-3544990590938409243 -3544990590938409243
8725474288412822874 8725474288412822874
-8206089057857703584 -8206089057857703584
-7903450126640733697 -7903450126640733697
They are exaclty the same. How could you get different numbers?
This is the output that I am getting after using Peter Lawrey's second update(I am using windows operating system and he seems to be using some other operaing system which has created the confusion)
SHA1PRNG appears to produce the same values with the same seed
The default PRNG on this system is SHA1PRNG
Revised again, this is the correct answer! (and I should follow my own advice and read the documentation more carefully)
Is this what you're using? If so, it extends Random so it has an inherited nextLong() method. As it overrides next() all the typical Random methods will be using the SecureRandom PRNG method.
(see in the comments why my second answer is incorrect.. or rather unnecessary)
I would suggest creating a long by just composing it out of the next 8 bytes or of two ints (returned by next). There's no problem with doing that and I can't see any reason why you wouldn't be able to touch all the long values (think that either of the two 32-bit halves can have values from 0 to 2^32, with equal probability) or why one would be more probable than another (which would mean it's not pseudo-random).
I do not completely understand why the Random documentation indicates that limitation for nextLong(), but I believe it is a limitation of the linear algorithm that it uses (I think linear algorithms have a much shorter cycle - i.e. when they start repeating numbers - than modern PRNGs). I think that's worth exploring on crypto stack exchange for curiosity.
SecureRandom extends Random, and Random has a nextLong() method: http://docs.oracle.com/javase/6/docs/api/java/util/Random.html#nextLong%28%29
BigInteger randomNumber = new BigInteger(numBits, random);
Note: With Random, a given seed will always produce the same results. With SecureRandom it will not. The seed just adds to the randomness.
Have you ever user secure random? The whole point of seed is to produce the same sequesnce of numbers. This is also the case with secure random. Two secure random numbers seeded with the same value produce same sequence of random numbers.
public static void main(String... args) throws NoSuchProviderException, NoSuchAlgorithmException {
testRNG("NativePRNG");
testRNG("SHA1PRNG");
System.out.println("The default PRNG on this system is " + new SecureRandom().getAlgorithm());
}
private static void testRNG(String prng) throws NoSuchAlgorithmException, NoSuchProviderException {
SecureRandom sr1 = SecureRandom.getInstance(prng, "SUN");
SecureRandom sr2 = SecureRandom.getInstance(prng, "SUN");
sr1.setSeed(1);
sr2.setSeed(1);
for (int i = 0; i < 10; i++) {
if (sr1.nextLong() != sr2.nextLong()) {
System.out.println(prng + " does not produce the same values with the same seed");
return;
}
}
System.out.println(prng + " appears to produce the same values with the same seed");
}
prints
NativePRNG does not produce the same values with the same seed
SHA1PRNG appears to produce the same values with the same seed
The default PRNG on this system is NativePRNG
go and try it first
Good advice, but just trying it doesn't always give you the whole answer in this case.

Java convert hash to random string

I'm trying to develop a reduction function for use within a rainbow table generator.
The basic principle behind a reduction function is that it takes in a hash, performs some calculations, and returns a string of a certain length.
At the moment I'm using SHA1 hashes, and I need to return a string with a length of three. I need the string to be made up on any three random characters from:
abcdefghijklmnopqrstuvwxyz0123456789
The major problem I'm facing is that any reduction function I write, always returns strings that have already been generated. And a good reduction function will only return duplicate strings rarely.
Could anyone suggest any ideas on a way of accomplishing this? Or any suggestions at all on hash to string manipulation would be great.
Thanks in advance
Josh
So it sounds like you've got 20 digits of base 255 (the length of a SHA1 hash) that you need to map into three digits of base 36. I would simply make a BigInteger from the hash bytes, modulus 36^3, and return the string in base 36.
public static final BigInteger N36POW3 = new BigInteger(""+36*36*36));
public static String threeDigitBase36(byte[] bs) {
return new BigInteger(bs).mod(N36POW3).toString(36);
}
// ...
threeDigitBase36(sha1("foo")); // => "96b"
threeDigitBase36(sha1("bar")); // => "y4t"
threeDigitBase36(sha1("bas")); // => "p55"
threeDigitBase36(sha1("zip")); // => "ej8"
Of course there will be collisions, as when you map any space into a smaller one, but the entropy should be better than something even sillier than the above solution.
Applying the KISS principle:
An SHA is just a String
The JDK hashcode for String is "random enough"
Integer can render in any base
This single line of code does it:
public static String shortHash(String sha) {
return Integer.toString(sha.hashCode() & 0x7FFFFFFF, 36).substring(0, 3);
}
Note: The & 0x7FFFFFFF is to zero the sign bit (hash codes can be negative numbers, which would otherwise render with a leading minus sign).
Edit - Guaranteeing hash length
My original solution was naive - it didn't deal with the case when the int hash is less than 100 (base 36) - meaning it would print less than 3 chars. This code fixes that, while still keeping the value "random". It also avoids the substring() call, so performance should be better.
static int min = Integer.parseInt("100", 36);
static int range = Integer.parseInt("zzz", 36) - min;
public static String shortHash(String sha) {
return Integer.toString(min + (sha.hashCode() & 0x7FFFFFFF) % range, 36);
}
This code guarantees the final hash has 3 characters by forcing it to be between 100 and zzz - the lowest and highest 3-char hash in base 36, while still making it "random".

How to generate a unique identifier of a fixed length in Java?

I am trying to generate a unique identifier of a fixed length such as the IDs that are generated by Megaupload for the uploaded files.
For example:
ALGYTAB5
BCLD23A6
In this example using from A-Z and 0-9 and with a fixed length of 8 the total different combinations are 2,821,109,907,456.
What if one of the generated id is already taken. Those ids are going to be stored in a database and it shouldn't be used more than once.
How can I achieve that in Java?
Thank you.
Hmm... You could imitate a smaller GUID the following way. Let first 4 bytes of your string be the encoded current time - seconds passed after Unix. And the last 4 just a random combination. In this case the only way two ID's would coincide is that they were built at the same second. And the chances of that would be very veeery low because of the other 4 random characters.
Pseudocode:
get current time (4 byte integer
id[0] = 1st byte of current time (encoded to be a digit or a letter)
id[1] = 2nd
id[2] = 3rd
id[3] = 4th
id[4] = random character
id[5] = random character
id[6] = random character
id[7] = random character
I have tried #Armen's solution however I would like to give another solution
UUID idOne = UUID.randomUUID();
UUID idTwo = UUID.randomUUID();
UUID idThree = UUID.randomUUID();
UUID idFour = UUID.randomUUID();
String time = idOne.toString().replace("-", "");
String time2 = idTwo.toString().replace("-", "");
String time3 = idThree.toString().replace("-", "");
String time4 = idFour.toString().replace("-", "");
StringBuffer data = new StringBuffer();
data.append(time);
data.append(time2);
data.append(time3);
data.append(time4);
SecureRandom random = new SecureRandom();
int beginIndex = random.nextInt(100); //Begin index + length of your string < data length
int endIndex = beginIndex + 10; //Length of string which you want
String yourID = data.substring(beginIndex, endIndex);
Hope this help!
We're using the database to check whether they already exist. If the number of IDs is low compared to the possible number you should be relatively safe.
You might also have a look at the UUID class (although it's 16-byte UUIDs).
Sounds like a job for a hash function. You're not 100% guaranteed that a hash function will return a unique identifier, but it works most of the time. Hash collisions must be dealt with separately, but there are many standard techniques for you to look into.
Specifically how you deal with collisions depends on what you're using this unique identifier for. If it's a simple one-way identifier where you give your program the ID and it returns the data, then you can simply use the next available ID in the case of a collision.

Generating a unique integer ID from a String

I need to generate a unique integer id for a string.
Reason:
I have a database application that can run on different databases. This databases contains parameters with parameter types that are generated from external xml data.
the current situation is that i use the ordinal number of the Enum. But when a parameter is inserted or removed, the ordinals get mixed up:
(FOOD = 0 , TOYS = 1) <--> (FOOD = 0, NONFOOD = 1, TOYS = 2)
The ammount of Parameter types is between 200 and 2000, so i am scared a bit using hashCode() for a string.
P.S.: I am using Java.
Thanks a lot
I would use a mapping table in the database to map these Strings to an auto increment value. These mapping should then be cached in the application.
Use a cryptographic hash. MD5 would probably be sufficient and relatively fast. It will be unique enough for your set of input.
How can I generate an MD5 hash?
The only problem is that the hash is 128 bits, so a standard 64-bit integer won't hold it.
If you need to be absolute certain that the id are unique (no collissions) and your strings are up to 32 chars, and your number must be of no more than 10 digits (approx 32 bits), you obviously cannot do it by a one way function id=F(string).
The natural way is to keep some mapping of the string to unique numbers (typically a sequence), either in the DB or in the application.
If you know the type of string values (length, letter patterns), you can count the total number of strings in this set and if it fits within 32 bits, the count function is your integer value.
Otherwise, the string itself is your integer value (integer in math terms, not Java).
By Enum you mean a Java Enum? Then you could give each enum value a unique int by your self instead of using its ordinal number:
public enum MyEnum {
FOOD(0),
TOYS(1),
private final int id;
private MyEnum(int id)
{
this.id = id;
}
}
I came across this post that's sensible: How to convert string to unique identifier in Java
In it the author describes his implementation:
public static long longHash(String string) {
long h = 98764321261L;
int l = string.length();
char[] chars = string.toCharArray();
for (int i = 0; i < l; i++) {
h = 31*h + chars[i];
}
return h;
}

Categories

Resources