I am trying to generate a unique identifier of a fixed length such as the IDs that are generated by Megaupload for the uploaded files.
For example:
ALGYTAB5
BCLD23A6
In this example using from A-Z and 0-9 and with a fixed length of 8 the total different combinations are 2,821,109,907,456.
What if one of the generated id is already taken. Those ids are going to be stored in a database and it shouldn't be used more than once.
How can I achieve that in Java?
Thank you.
Hmm... You could imitate a smaller GUID the following way. Let first 4 bytes of your string be the encoded current time - seconds passed after Unix. And the last 4 just a random combination. In this case the only way two ID's would coincide is that they were built at the same second. And the chances of that would be very veeery low because of the other 4 random characters.
Pseudocode:
get current time (4 byte integer
id[0] = 1st byte of current time (encoded to be a digit or a letter)
id[1] = 2nd
id[2] = 3rd
id[3] = 4th
id[4] = random character
id[5] = random character
id[6] = random character
id[7] = random character
I have tried #Armen's solution however I would like to give another solution
UUID idOne = UUID.randomUUID();
UUID idTwo = UUID.randomUUID();
UUID idThree = UUID.randomUUID();
UUID idFour = UUID.randomUUID();
String time = idOne.toString().replace("-", "");
String time2 = idTwo.toString().replace("-", "");
String time3 = idThree.toString().replace("-", "");
String time4 = idFour.toString().replace("-", "");
StringBuffer data = new StringBuffer();
data.append(time);
data.append(time2);
data.append(time3);
data.append(time4);
SecureRandom random = new SecureRandom();
int beginIndex = random.nextInt(100); //Begin index + length of your string < data length
int endIndex = beginIndex + 10; //Length of string which you want
String yourID = data.substring(beginIndex, endIndex);
Hope this help!
We're using the database to check whether they already exist. If the number of IDs is low compared to the possible number you should be relatively safe.
You might also have a look at the UUID class (although it's 16-byte UUIDs).
Sounds like a job for a hash function. You're not 100% guaranteed that a hash function will return a unique identifier, but it works most of the time. Hash collisions must be dealt with separately, but there are many standard techniques for you to look into.
Specifically how you deal with collisions depends on what you're using this unique identifier for. If it's a simple one-way identifier where you give your program the ID and it returns the data, then you can simply use the next available ID in the case of a collision.
Related
Using UUID.randomUUID() is best when we need the universal unique number of 36 char as alfa numeric but can't see an authentic function to generate the 36 digits number (Unique).
String lUUID = String.format("%040d", new BigInteger(UUID.randomUUID().toString().replace("-", ""), 16));
The above code we can use to generate the unique number but it's giving 40 digits and no guarantee of being uniqueness.
You definitely can't guarantee uniqueness any more than UUID.randomUUID() can. Without synchronizing with, like, a server somewhere, you can't guarantee that; you can just make it improbable.
But to make it improbable as possible, just use the random constructor:
SecureRandom random = new SecureRandom()
BigInteger bigint;
do {
bigint = new BigInteger(120, random); // 2^120 > 10^36
} while (bigint.compareTo(BigInteger.TEN.pow(36)) >= 0);
I'm to implement a hash function, and here is my hash function (the first draft version that is)
public int hashCode(){
String fixedISBN = getIsbn().toString().replace("-", "");
fixedISBN = fixedISBN.substring(fixedISBN.length()-4, fixedISBN.length());
int ISBN = Integer.parseInt(fixedISBN);
int ASCII = 0;
for (int i = 0; i < getTitle().toString().length(); i++) {
ASCII += getTitle().toString().charAt(i);
}
int hashValue = (ISBN * 37 + ASCII*23);
return hashValue;
}
I am meant to hash books, and to do so I initially thought to use the ISBN value of a book, which serves as a wholly unique identifier for every book. Then I looked at the list of ISBNs and saw that using the entire ISBN since there isn't a lot of variation of the ISBN numbers. As such I use only the last four numbers of the ISBN since those numbers tend to be the ones that vary. I also plan to use the ASCII value of the title's chars for my hashValue, but I believe a problem arises since ASCII values can only amount to 127, which means there would be a problem if the title is short, say only 8 chars or less which would produce a maximum value 1016. If the table size is very large, say 10 007 it wouldn't produce a very even spread. Is there any way I could make ASCII values more suitable to produce a hash value of a large table
I am trying to generate two 9 digit long random long value in Java using the below code:
for (int i =0;i<2;i++) {
String axisIdStr = Long.toString((long)(System.nanoTime() * (Math.random() * 1000)));
System.out.println("######## axisIdStr "+axisIdStr);
String axId = axisIdStr.substring((axisIdStr.length() -9), axisIdStr.length()) ;
}
But when I run this in windows, i get two different numbers where as when run in mac, I get same two numbers. Why is this happening ?
Can you suggest a better way to generate the long values?
According to your requirement you need to generate 9 digit random numbers. As in the comment suggested you can do it using random.Below I have just given one solution to generate random number between two numbers.
long lowerLimit = 123456712L;
long upperLimit = 234567892L;
Random r = new Random();
long number = lowerLimit+((long)(r.nextDouble()*(upperLimit-lowerLimit)));
You could create an array a[] of int of size 9, and populate with random integers 0-9. Then sum the array up multiplying accordingly.
a[8]*1 + a[7]*10 + a[6]*100 ...
You need to make sure that a[0] only takes digits 1-9 tho...
To get even more random sequence, you Ideally should work on Strings, then you would be able to get 0 on the start position of your random "string", it won't be a number.
Or maybe generate somthing pseudo random and strip last 9 digits out of it.
That's the DIY version of what you could accomplish with what's already out there...
Regards
How can I convert a non-numeric String to an Integer?
I got for instance:
String unique = "FUBAR";
What's a good way to represent the String as an Integer with no collisions e.g. "FUBAR" should always be represented as the same number and shan't collide with any other String. For instance, String a = "A"; should be represented as the Integer 1 and so on, but what is a method that does this (preferrably for all unicode strings, but in my case ASCII values could be sufficient).
This is impossible. Think about it, an Integer can only be 32 bits. So, by the pigeonhole principle, there must exist at least two strings that have the same Integer value no matter what technique you use for conversion. In reality, there are infinite with the same values...
If you're just looking for an efficient mapping, then I suggest that you just use the int returned by hashCode(), which for reference is actually 31 bits.
You can map Strings to unique IDs using table. There is not way to do this generically.
final Map<String, Integer> map = new HashMap<>();
public int idFor(String s) {
Integer id = map.get(s);
if (id == null)
map.put(s, id = map.size());
return id;
}
Note: having unique id's doesn't guarantee no collisions in a hash collection.
http://vanillajava.blogspot.co.uk/2013/10/unique-hashcodes-is-not-enough-to-avoid.html
If you know the character set used in your strings, then you can think of the string as number with base other than 10. For example, hexadecimal numbers contain letters from A to F.
Therefore, if you know that your strings only contain letters from an 8-bit character set, you can treat the string as a 256-base number. In pseudo code this would be:
number n;
for each letter in string
n = 256 * n + (letter's position in character set)
If your character set contains 65535 characters, then just multiply 'n' with that number on each step. But beware, the 32 bits of an integer will be easily overflown. You probably need to use a type that can hold a larger number.
private BigDecimal createBigDecimalFromString(String data)
{
BigDecimal value = BigDecimal.ZERO;
try
{
byte[] tmp = data.getBytes("UTF-8");
int numBytes = tmp.length;
for(int i = numBytes - 1; i >= 0; i--)
{
BigDecimal exponent = new BigDecimal(256).pow(i);
value = value.add(exponent.multiply(new BigDecimal(tmp[i])));
}
}
catch (UnsupportedEncodingException e)
{
}
return value;
}
Maybe a little bit late, but I'm going to give my 10 cents to simplify it (internally is similar to BigDecimal suggested by #Romain Hippeau)
public static BigInteger getNumberId(final String value) {
return new BigInteger(value.getBytes(Charset.availableCharsets().get("UTF-8")));
}
Regardless of the accepted answer, it is possible to represent any String as an Integer by computing that String's Gödelnumber, which is a unique product of prime numbers for every possible String. With that being said it's quite impractical and slow to implement, also for most Strings you would need a BigInteger rather than a normal Integer and to decode a Gödelnumber into its corresponding String you need to have a defined Charset.
I need to generate a unique integer id for a string.
Reason:
I have a database application that can run on different databases. This databases contains parameters with parameter types that are generated from external xml data.
the current situation is that i use the ordinal number of the Enum. But when a parameter is inserted or removed, the ordinals get mixed up:
(FOOD = 0 , TOYS = 1) <--> (FOOD = 0, NONFOOD = 1, TOYS = 2)
The ammount of Parameter types is between 200 and 2000, so i am scared a bit using hashCode() for a string.
P.S.: I am using Java.
Thanks a lot
I would use a mapping table in the database to map these Strings to an auto increment value. These mapping should then be cached in the application.
Use a cryptographic hash. MD5 would probably be sufficient and relatively fast. It will be unique enough for your set of input.
How can I generate an MD5 hash?
The only problem is that the hash is 128 bits, so a standard 64-bit integer won't hold it.
If you need to be absolute certain that the id are unique (no collissions) and your strings are up to 32 chars, and your number must be of no more than 10 digits (approx 32 bits), you obviously cannot do it by a one way function id=F(string).
The natural way is to keep some mapping of the string to unique numbers (typically a sequence), either in the DB or in the application.
If you know the type of string values (length, letter patterns), you can count the total number of strings in this set and if it fits within 32 bits, the count function is your integer value.
Otherwise, the string itself is your integer value (integer in math terms, not Java).
By Enum you mean a Java Enum? Then you could give each enum value a unique int by your self instead of using its ordinal number:
public enum MyEnum {
FOOD(0),
TOYS(1),
private final int id;
private MyEnum(int id)
{
this.id = id;
}
}
I came across this post that's sensible: How to convert string to unique identifier in Java
In it the author describes his implementation:
public static long longHash(String string) {
long h = 98764321261L;
int l = string.length();
char[] chars = string.toCharArray();
for (int i = 0; i < l; i++) {
h = 31*h + chars[i];
}
return h;
}