How to create a 64 bit Unique Integer in Java

How to create a 64 bit Unique Integer in Java - java

I need to create a 64 bit unique integer in Java so that collision chances are low. The system is not distributed, so collisions between different computers are not a problem.
Is there any way, we can create a 64 bit integer in Java which is always Unique?
As of now I am using -
long number = System.nanoTime();
Is this the right way to generate 64 bit Unique Integer in Java or is there anything else I can try?
UPDATE:-
How about doing this way? Will this be unique?
UUID number = UUID.randomUUID();
long uniqueNumber = number.timestamp();

If you need the numbers to be unique in one process, robust between restarts, you can use a simple AtomicLong and a timer.
private static final AtomicLong TS = new AtomicLong();
public static long getUniqueTimestamp() {
long micros = System.currentTimeMillis() * 1000;
for ( ; ; ) {
long value = TS.get();
if (micros <= value)
micros = value + 1;
if (TS.compareAndSet(value, micros))
return micros;
}
}
This will give you a unique "timestamp" with a millisecond accuracy but can only handle 1000 ids per millisecond without getting ahead of the actual time. This works fine on restart as the time will jump past previous values (again assuming you have less than one million per second on average)

Use a HashSet in order to ensure uniqueness of the values you're storing. You can then check whether the insert was successful by checking what add returns. If the values have to be 'randomised' you can use your own algorithm, or check out SecureRandom.
Long getUniqueNumber(HashSet uniqueNumberSet) {
Long unique = generateUniqueNumber();
if(!uniqueNumberSet.add(unique)) { // handle collision }
return unique;
}

As Marc B said, the best approach is a simple long which is initialized with zero and incremented every time you need a new value.
If you need concurrency, or if performance is not an issue at all, then you can use AtomicLong as suggested by Loc Ha; however, if you really need it to be a long and not an int, then I suspect you are going to be generating lots of them, so you should probably avoid the extra overhead of AtomicLong unless you are sure you also need concurrency.
System.nanoTime() is not a good idea, as you have no guarantee that two consecutive calls to it will always yield different values.
EDIT (to cover update in question)
No, the timestamp part of the UUID is not guaranteed to be unique, for precisely the same reasons that System.nanoTime() is not guaranteed to be unique. If the timestamp of the UUID was unique, then there would be no need to have a UUID type, we would just always use that timestamp part. Time is always a bad way to go about guaranteeing uniqueness.

You want to get unique ID, the appropriate way(although 128 bit):
UUID.randomUUID();
A bit less appropriate(collisions* are possible) with 64 bits:
UUID.getLeastSignificantBits();
UUID.getMostSignificantBits();
To really get unique ID(if they are critical to your operation):
Use centralised storage of all IDs
When you need an ID, let this centralised system handle it -> DB and auto incremented values are usually the easiest way
*collisions => 2 or more equal values

Related

Generating long unique id with Murmur3 from google guava

at the moment i am trying to generate unique identifiers of type long on the client side.
I have a parent/child relationship where the parent already has a UUID as identifier.
I want to consider the Parent-UUID for calculating a Child-Id of type long.
I have this implementation at the moment:
public static void main(String[] args) {
/** Funnel. */
final Funnel<UUID> UUID_FUNNEL = new Funnel<UUID>() {
#Override
public void funnel(UUID parentUUID, PrimitiveSink into) {
final UUID tmpId = UUID.randomUUID();
into
// consider parent uuid
.putLong(parentUUID.getMostSignificantBits())
.putLong(parentUUID.getLeastSignificantBits())
// consider tmp uuid
.putLong(tmpId.getMostSignificantBits())
.putLong(tmpId.getLeastSignificantBits());
}
};
final UUID parentUUID = UUID.randomUUID();
System.out.println(parentUUID.toString());
for (int i = 0; i < 1000; i++) {
final long childId = Hashing.murmur3_128().newHasher()
.putObject(parentUUID, UUID_FUNNEL)
.hash().asLong();
System.out.println(childId);
}
}
What do you think about this idea?
Any suggestions are welcome.
I have already read this Question:
How to generate unique Long using UUID

This won't really work. Surely no better than a random long.
Without tmpId: You only hash the parentUUID, so all children of the same parent get the same long.
With tmpId: You could use UUID.randomUUID().getLeastSignificantBits() or just random.nextLong() and save yourself all the work (hashing a random value leads to a random result, no matter what you add).
I have multple clients not only one.
Then ask a unique server. This includes some overhead which can be easily minimized using a hi-lo algorithm.
At the DB level the child id's must be unique.
Then forget it and let the DB generate the id. Every DB has some AUTOINCREMENT or SEQUENCE meant exactly for this.
In case you need the id in the client before you access the DB, ask the DB (and use the hi-lo algorithm in order to minimize the overhead).
Offline working
I just saw your comment:
The clients should not go to a server to get the next id. It must be possible to work offline.
This is a big pain. Any hashing you could do won't be better than a random long.
A set of one million random longs has a collision chance of about 1e-6, which might be acceptable. Note that due to the birthday paradox, the chance grows quadratically with the set size.
You could try to handle the offline created entities without an ID (using some other identifier), but this sounds like a big pain.
You could preallocate some IDs for each client. This sounds wasteful, but preallocating 100 ID for each of one million clients uses up less than 5% of all possible IDs.
You could switch to random UUIDs. Because of them being 128 bits long, the collision chance is practically zero even for billions of ID.

How to get a unique alphanumeric based on a unique integer

My webapplication has a table in the database with an id column which will always be unique for each row. In addition to this I want to have another column called code that will have a 6 digit unique Alphanumeric code with numbers 0-9 and alphabets A-Z. Alphabets and number can be duplicate in a code. i.e. FFQ77J. I understand the uniqueness of this 6 digit alphanumeric code reduces over time as more rows are added but for now I am ok with this.
Requirement (update)
- The code should be at least of length 6
- Each code should be Alphanumeric
So I want to generate this Alphanumeric code.
Question
What is a good way to do this?
Should I generate the code and after the generation, run a query to the database and check if it already exists, and if so then generate a new one? To ensure the uniqueness, does this piece of code need to be synchronized so that only one thread runs it?
Is there something built-in to the database that will let me do this?
For the generation I will be using something like this which I saw in this answer
char[] symbols = new char[36];
char[] buf;
for (int idx = 0; idx < 10; ++idx)
symbols[idx] = (char) ('0' + idx);
for (int idx = 10; idx < 36; ++idx)
symbols[idx] = (char) ('A' + idx - 10);
public String nextString()
{
for (int idx = 0; idx < buf.length; ++idx)
buf[idx] = symbols[random.nextInt(symbols.length)];
return new String(buf);
}

Since it's a requirement for the shortcode to not be guessable, you don't want to tie it to your uniqueID row ID. Otherwise that means your rowID needs to be random, in addition to unique. Starting with a counter 0, and incrementing, makes it pretty obvious when your codes are: 000001, 000002, 000003, and so forth.
For your short code, generate a random 32bit int, omit the sign and convert to base36. Make a call to your database, to ensure it's available.
You haven't explicitly called out scalability, but I think it's important to understand the limitations of your design wrt to scale.
At 2^31 possible 6 char base36 values, you will have collisions at ~65k rows (see Birthday Paradox questions)
From your comment, modify your code:
public String nextString()
{
return Integer.toString(random.nextInt(),36);
}

I would simply do this:
String s = Integer.toString(i, 36).toUpperCase();
Choosing base-36 will use characters 0-9a-z for the digits. To get a string that uses uppercase letters (as per your question) you would need to fold the result to upper case.
If you use an auto increment column for your id, set the next value to at least 60,466,176, which when rendered to base 36 is 100000 - always giving you a 6 digit number.

I would start with 0 for an empty table and do a
SELECT MAX(ID) FROM table
to find the largest id so far. Store it in an AtmoicInteger and convert it using toString
AtomicInteger counter = new AtomicInteger(maxSoFar);
String nextId = Integer.toString(counter.incrementAndGet(), 36);
or for padding. 36 ^^ 6 = 2176782336L
String nextId = Long.toString(2176782336L + counter.incrementAndGet(), 36).substring(1);
This will give you uniqueness and no duplicates to worry about. (it's not random either)

Simply, you can use Integer.toString(int i, int radix). Since you have base 36(26 letters+10 digits) you set the radix to 36 and i to your integer. For example, to use 16501, do:
String identifier=Integer.toString(16501, 36);
You can uppercase it with .toUpperCase()
Now onto your other questions, yes, you should query the database first to ensure it doesn't exist. If depending on the database, it may need to be synchronized, or it may not be as it'll use its own locking system. In any case, you'd need to tell us which database.
On the question of whether there's a builtin, we'd need to know the DB type as well.

To create a random but unique value within a small range here are some ideas I know of:
Create a new random value and try to insert it.
Let a database constraint catch violations. This column should also likely be indexed. The DML may need to be tried several times until a unique ID is found. This will lead to more collisions as time progresses, as noted (see the birthday problem).
Create a "free IDs" table ahead of time and on usage mark the ID as being used (or delete it from the "free IDs" table). This is similar to #1 but shifts when the work is done.
This allows the work of finding "free IDs" to be done at another time, perhaps during a cron job, so that there will not be a contraint violation during the insert keeping the insert itself the "same speed" throughout the usage of said domain. Make sure to use transactions.
Create a 1-to-1/injective "mixer" function such that the output "appears random". The point is this function must be 1-to-1 to inherently avoid duplicates.
This output number would then be "base 36 encoded" (which is also injective); but it would be guaranteed unique as long as the input (say, an auto-increment PK) was unique. This would likely be less random than the other approaches, but should still create a nice-looking non-linear output.
A custom injective function can be created around an 8-bit lookup table fairly trivially - just process a byte at a time and shuffle the map appropriately. I really like this idea, but it can still lead to somewhat predictable output
To find free IDs, approaches #1 and #2 above can use "probing with IN" to minimize the number of SQL statements used. That is, generate a bunch of random values and query for them using IN (keeping in mind what sizes of IN your database likes) and then see which values were free (as having no results).
To create a unique ID not constained to such a small space, a GUID or even hashing (e.g. SHA1) might be useful. However, these only guarantee uniqueness because they have 126/160-bit spaces so that the chance of collision (for different input/time-space) is currently accepted as improbable.
I actually really like the idea of using an injective function. Bearing in mind that it is not good "random" output, consider this pseudo-code:
byte_map = [0..255]
map[0] = shuffle(byte_map, seed[0])
..
map[n] = shuffle(byte_map, seed[1])
output[0] = map[0][input[0]]
..
output[n] = map[n][input[n]]
output_str = base36_encode(output[0] .. output[n])
While a very simple setup, numbers like 0x200012 and 0x200054 will still share common output - e.g. 0x1942fe and 0x1942a9 - although the lines will be changed a bit due to the later application of the base-36 encoding. This could probably be further improved to "make it look more random".

For efficient usage, try caching generated code in a HashSet<String> in your application:
HashSet<String> codes = new HashSet<String>();
This way you don't have to make a db call every time to check whether the generated code is unique or not. All you have to do is:
codes.contains(newCode);
And, yes, you should synchronize your method which updates the cache
public synchronize String getCode ()
{
String newCode = "";
do {
newCode = nextString();
}
while(codes.contains(newCode));
codes.put(newCode);
}

You mentioned in your comments that the relationship between id and code should not be easily guessable. For this you basically need encryption; there are plenty of encryption programs and modules out there that will perform encryption for you, given a secret key that you initially generate. To employ this approach, I would recommend converting your id into ascii (i.e., representing as base-256, and then interpreting each base-256 digit as a character) and then running the encryption, and then converting the encrypted ascii (base-256) into base 36 so you get your alpha-numeric, and then using 6 randomly chosen locations in the base 36 representation to get your code. You can resolve collisions e.g. by just choosing the nearest unused 6-digit alpha-numeric code when a collision occurs, and noting the re-assigned alpha-numeric code for the id in a (code <-> id) table that you will have to maintain anyway since you cannot decrypt directly if you only store 6 base-36 digits of the encrypted id.

Creating a unique timestamp in Java

I need to create a timestamp (in milliseconds) in Java that is guaranteed to be unique in that particular VM-instance. I.e. need some way to throttle the throughput of System.currentTimeMillis() so that it returns at most one results every ms. Any ideas on how to implement that?

This will give a time as close the current time as possible without duplicates.
private static final AtomicLong LAST_TIME_MS = new AtomicLong();
public static long uniqueCurrentTimeMS() {
long now = System.currentTimeMillis();
while(true) {
long lastTime = LAST_TIME_MS.get();
if (lastTime >= now)
now = lastTime+1;
if (LAST_TIME_MS.compareAndSet(lastTime, now))
return now;
}
}
One way to avoid the limitation of one id per milli-second is to use a micro-second timestamp. i.e. multiply currentTimeMS by 1000. This will allow 1000 ids per milli-second.
Note: if time goes backwards, eg due to an NTP correction, the time will just progress at 1 milli-second per invocation until time catches up. ;)

You can use System.nanoTime() for better accuracy
Although I tried below and each time it gives different values, it probably is not guaranteed to be unique all the time.
public static void main(String[] args) {
long time1 = System.nanoTime();
long time2 = System.nanoTime();
long time3 = System.nanoTime();
System.out.println(time1);
System.out.println(time2);
System.out.println(time3);
}
Another way is to use AtomicInteger/AtomicLong classes for unique numbers if the time is not important for you and you just need unique number, this probably is a btter choice.

While searching for a solution I came across ULIB
(Universally Unique Lexicographically Sortable Identifier)
https://github.com/huxi/sulky/tree/master/sulky-ulid/
It's not a long, but shorter then UUID.
A ULID:
Is compatible with UUID/GUID's
1.21e+24 unique ULIDs per millisecond (1,208,925,819,614,629,174,706,176 to be exact)
Lexicographically sortable
Canonically encoded as a 26 character string, as opposed to the 36 character UUID
Uses Crockford's base32 for better efficiency and readability (5 bits per character)
Case insensitive
No special characters (URL safe)

You could use System.nanoTime(), which is the most precise available system timer, and divide that by million to get milliseconds. While there are no formal guarantees on how often it's updated, I believe it's reasonable to assume that it updates way more (order(s) of magnitude) frequently than once per millisecond. Of course, if you create integer timestamps by less than millisecond interval, then they can't all be unique.
Note that the absolute value nanoTime() is arbitrary. If you want absolute time, calibrate it somehow, i.e. compare it to currentTimeMillis() when starting.

Could you perhaps make use of java.util.UUID and it's timestamp() and clockSequence()?
Method Summary
int clockSequence()
The clock sequence value associated with this UUID.
long timestamp()
The timestamp value associated with this UUID.
More details here: http://docs.oracle.com/javase/1.5.0/docs/api/java/util/UUID.html

java for loop executes too fast gives System.currentTimeMillis() duplicate

Java: I have a problem using System.currentTimeMillis() function
i am using System.currentTimeMillis() to generate unique values in foor loop problem is loop executes too fast and System.currentTimeMillis() gives me duplicate values.
How can i generate for sure unique values.
for(int a=0;a<=10;a++){
System.out.println(System.currentTimeMillis())
}
I also tried following but it is also not generaet to generate unique number
System.currentTimeMillis()+Math.random()

why don't you use System.nanoTime() instead?

Why don't you use a UUID library to generate unique identifiers (already there in the JDK http://download.oracle.com/javase/6/docs/api/java/util/UUID.html).
Or for a more simple approach: append a static counter

I think your approach is wrong, if this is a requirement.
Theoretically, no matter how fine-grained your timer, a machine might execute it in less time than the timer's granularity. It's not correct in a technical sense to depend on this being true.
Or looking at it another way - why do you need these values to be unique (what are you using them for)? If you really want them to be a measure of the time it was executed, then you ought to be happy that two iterations that happened within the same millisecond got the same value.
Have you considered using a static, monotonous counter to assign IDs to each iteration that are unique within each execution (AtomicLong is great for this)? Something like the following is very easy and has no concurrency issues:
public class YourClass {
private static final AtomicLong COUNTER = new AtomicLong();
private static nextId() { return COUNTER.getAndIncrement(); }
// Rest of the class, which calls nextId() when it needs an identifier
}
If you need the timing info and uniqueness, then that's two separate requirements, so why not have a composite key made up of the time and an arbitrary unique ID?

The answer is obvious - get a slower computer! Well, that or use System.nanoTime as described right here on SO - System.currentTimeMillis vs System.nanoTime. But seriously, you shouldn't be using time as unique number generator unless you absolutely have to.
The problem with using the system time of course being that:
The time returned by your system
calls is rounded up to a higher
degree of precision than the actual
CPU clock time. If your ID generation
code runs faster than this degree of
precision then you will have
collision.
If your code is distributed and each
unit of work is generating ID's then
you run into the possibility of ID
collision as the separate CPU's or
CPU core's allocate ID's using their
independent clocks.
In libraries like Java that are
actually returning the system time
based off a user settable property
you run into a higher chance of
multiple ID collision anytime the
date is reset to some period in the
past, for whatever reason.
A very good alternative to generating unique identifiers is to utilize the not-so-ironically named Universally Unique Identifier. There is a multiple implementations in various languages, for Java 5 and higher you can use the UUID class.
Edit: To add some useful information about UUID.

Similar to #Andrej's solution, but combining a timer and a counter so your numbers shouldn't repeat if you restart your application.
public enum IdGenerator {
;
private static final AtomicLong COUNTER = new AtomicLong(System.currentTimeMillis()*1000);
public static long nextId() { return COUNTER.getAndIncrement(); }
}

If you want to still use your method, you could do:
for(int a=0;a<=10;a++){
Thread.sleep(1);
System.out.println(System.currentTimeMillis())
}
Explicitly making your CPU slower.

try Math.random()*System.currentTimeMillis()
here is a sample outcome
4.1140390961236145E11,
4.405289623285403E11,
6.743938910583776E11,
2.0358542930175632E11,
1.2561886548511025E12,
8.629388909268735E11,
1.158038719369676E12,
2.5899667030405692E11,
7.815373208372445E11,
1.0887553507952611E12,
3.947241572203385E11,
1.6723200316764807E11,
1.3071550541162832E12,
2.079941126415029E11,
1.304485187296599E12,
3.5889095083604164E10,
1.3230275106525027E11,
6.484641777434403E11,
5.109822261418748E11,
1.2291750972884333E12,
8.972865957307518E11,
4.022754883048088E11,
7.997154244301389E11,
1.139245696210086E12,
2.633248409945871E11,
8.699957189419155E11,
9.487098785390422E11,
1.1645067228773708E12,
1.5274939161218903E11,
4.8470112347655725E11,
8.749120668472205E11,
2.435762445513599E11,
5.62884487469596E11,
1.1412787212758718E12,
1.0724213377031631E12,
3.1388106597100226E11,
1.1405727247661633E12,
1.2464739913912961E12,
3.2771161059896655E11,
1.2102869787179648E12,
1.168806596179512E12,
5.871383012375131E11,
1.2765757372075571E12,
5.868323434343102E11,
9.887351363037219E11,
5.392282944314777E11,
1.1926033895638833E12,
6.867917070018711E11,
1.1682059242674294E12,
2.4442056772643954E11,
1.1250254537683052E12,
8.875186600355891E10,
3.46331811747409E11,
1.127077925657995E12,
7.056541627184794E11,
1.308631075052609E12,
7.7875319089675E11,
5.52717019956371E11,
7.727797813063546E11,
6.177219592063667E11,
2.9448141585070874E11,
9.617992263836586E11,
6.762500987418107E11,
1.1954995292124463E12,
1.0741763597148225E12,
1.9915919731861673E11,
9.507720563185525E11,
1.1009594810160002E12,
4.1381256571745465E11,
2.2526550777831213E11,
2.5919816802026202E11,
3.8453225321522577E11,
3.796715779825083E11,
6.512277843921505E10,
1.0483456960599313E12,
1.0725956186588704E11,
5.701504883615902E11,
9.085583903150035E11,
1.2764816439306753E12,
1.033783414053437E12,
1.188379914238302E12,
6.42733442524156E11,
3.911345432964901E11,
7.936334657654698E11,
1.4473479058272617E11,
1.2030471387183499E12,
5.900668555531211E11,
8.078992189613184E11,
1.2004364275316113E12,
1.250275098717202E12,
2.856556784847933E11,
1.9118298791320355E11,
5.4291847597892596E11,
3.9527733898520874E11,
6.384539941791654E11,
1.2812873515441786E11,
6.325269269733575E9,
5.403119000792323E11,
8.023708335126083E11,
3.761680594623883E10,
1.2641772837928888E11,

Check out UUID as well...

My suggestion
long id = System.currentTimeMillis();
for (int i = 0; i < 10; i++) {
//do your work
id++;
}

How can I assemble bits into a long to create a unique ID?

I would like to write a utility that will provide me with a relatively unique ID in Java. Something pretty simple, like x bits from timestamp + y bits from random number.
So, how would I implement the following method:
long getUniqueID()
{
long timestamp = System.currentTimeMillis();
long random = some random long
...
return id;
}
BONUS
Any suggestions for other easily obtainable information I could use to form my ID?
note: I am aware of GUIDs and I know Java has a UUID class, but I don't want something that is 128 bits long.

Just clip the bits you don't need:
return java.util.UUID.randomUUID().getLeastSignificantBits();

What you are trying to do is create a hash function that combines two long values into a single long value. In this case, the uniformity of the hash function will be of utmost importance since collisions in created unique ID values are unacceptable. However, if you can compare hash values to previously created identifiers, then collisions can be resolved by modifying the hash until no collision occurs.
For example, you could take the time stamp and perform an exclusive-or (using the caret ^ operator in Java) with the random value. If a collision is detected, then add one to the result.

If unique in the same JVM is enough then something like this should do the job.
public class UniqueID {
static long current= System.currentTimeMillis();
static public synchronized long get(){
return current++;
}
}

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.