Can anyone suggest if i use below code to generate id for my files, will it be unique always.
As 100s forms create the form at same automatically which auto populate ids in ID textbox. So it should be thread safe and If i restart the application it should not ever repeat the id which already generated before the application stop anytime.
private static final AtomicLong count = new AtomicLong(0L);
public static String generateIdforFile()
{
String timeString = Long.toString(System.currentTimeMillis(), 36);
String counterString = Long.toString(counter.incrementAndGet() % 1000, 36);
return timeString + counterString;
}
And forms are getting the Id using ClassName.generateIdforFile();
Why not just use a UUID for your file id? You could use something like the following:
public static String generateIdforFile() {
return UUID.randomUUID().toString();
}
Or do you need a (ongoing) numeric value?
If the number just has to be numeric (and not ongoing) you could use UUID#getLeastSignificantBits() or UUID#getMostSignificantBits() for the numeric value.
Quoting this answer on SO:
So the most significant half of your UUID contains 58 bits of
randomness, which means you on average need to generate 2^29 UUIDs to
get a collision (compared to 2^61 for the full UUID).
You will of course not be as collision secure as using the full UUID.
If you are making method as synchronized there is no need to use AtomicLong variables.
Because concurrency is ensured by using synchronized keyword.
Using excessive concurrent variables hampers efficiency and performance of application.
Better use a global AtomicLong starting at 0L for you entire application. Then you concatenate with CurrentTimeMillis.
static AtomicLong counter = new AtomicLong(0L);
public static String generateIdforFile()
{
String timeString = Long.toString(System.currentTimeMillis(), 36);
String counterString = Long.toString(counter.incrementAndGet() % 1000, 36);
return timeString + counterString;
}
This has greater chances to yield unique IDs, even between application restarts, provided that your app takes a bit more than some milliseconds to shutdown and restart. Note that the method is not synchronized anymore. (no need) And provided also, that you create less than a thousand files in the same millisecond. But you can't guarantee universal uniqueness.
Related
Is there any way to hash a string and specify the characters allowed in the output, or a better approach to avoid collisions when producing a hash of 8 characters in length.
I am running into a situation where I am seeing a collision with my current hashing method (see example implementation below).
currently using crc32 from https://guava.dev/releases/20.0/api/docs/com/google/common/hash/Hashing.html
the hashes produced are alphaNumeric, 8 characters in length.
I need to keep the 8 digit length (not storing passwords), Is there a way to specify an "Alphabet" of allowed output characters of a hashing function?
e.g. to allow (a-z, 0-9,) and a set of characters e.g. (_,$,-),
the characters added will need to be URI friendly
This would allow me to decrease the possibility of collisions occurring.
The hash output will be stored in a cache for a maximum of 60 days, so collisions occurring after that period will have no affect
current approach example code:
import com.google.common.hash.HashFunction;
import com.google.common.hash.Hasher;
import com.google.common.hash.Hashing;
public class Test {
private static final String SALT = "4767c3a6-73bc-11ec-90d6-0242ac120003";
public static void main( String[] args )
{
// actual strings causing collisions removed as have to redact some data
String string1 = "myStringOne";
String string2 = "myStringTwo";
System.out.println( "string1:" + string1);
System.out.println( "string1 hashed:" + doHash(string1, SALT));
System.out.println( "string2:" + string2);
System.out.println( "string2 hash:" + doHash(string2, SALT));
}
private static String doHash(String keyValue, String salt){
HashFunction func = Hashing.crc32();
Hasher hasher = func.newHasher();
hasher.putUnencodedChars(keyValue);
hasher.putUnencodedChars(salt);
return hasher.hash().toString();
}
}
functionality of the code/problem statement
using key store db.
A user requests a resource,
hash is made of (user details & requested resource).
if resulting id already present -> return that item from DB
else, perform processing on resource and store in db, with result from hash as ID
cache is purged periodically.
Questions.
Is there a way to specify the alphabet the hash is allowed to use in its output?
I checked the docs but do not see an approach https://guava.dev/releases/20.0/api/docs/com/google/common/hash/Hashing.html
Or is there an alternative approach that would be recommended?
e.g. generating a longer hash and taking a subset.
I am reading the book "Core Java I" written by Cay S. Horstmann and at page 580 he mentiones about the LongAdder:
If you anticipate high contention [*1], you should simply use a LongAdder instead of an
AtomicLong. The method names are slightly different. Call increment to increment a counter
or add to add a quantity, and sum to retrieve the total.
var adder = new LongAdder();
for (. . .)
pool.submit(() -> {
while (. . .) {
. . .
if (. . .) adder.increment();
}
});
. . .
long total = adder.sum();
Note
Of course, the increment method does not return the old [*2] value. Doing that would undo
the efficiency gain of splitting the sum into multiple summands.
In [*1] by the word "contention", I assume he means heavily overloaded second of the machine that there are lots of threads that runs the java code.
In [*2] he mentioned about the old value. What does old and new value in this context? Could you please explain briefly.
[*1]: The term "contention" in context of multithreading means that many threads try to access/call/update something at the same time; in this case the LongAdder or counter in general.
[*2]: The old value in this context is the previous value of the LongAdder. While all updating methods of AtomicLong, except set and some CAS-methods, return the previous value stored, LongAdder#increment returns void. The new value is simply the .. new value, the one that you can get via sum.
The class LongAdder works differently than AtomicLong to increase throughput, which is why e.g. increment doesn't return anything. You can read about it here: How LongAdder performs better than AtomicLong
LongAdder doesn't maintain one value. When you increment/add a new value, it stores 1 or new value in different Cell. It doesn't maintain total value.
When you want to get actual value you call sum() method which sums all values to get you result.
For better understanding, here's how the sum method is implemented in LongAdder:
public long sum() {
Cell[] cs = cells;
long sum = base;
if (cs != null) {
for (Cell c : cs)
if (c != null)
sum += c.value;
}
return sum;
}
I read some answers , usually they use a set or some other data structure to ensure there is no duplicates. but for my situation , I already stored a lot random string in database , I have to make sure that the generated random string should not existed in database .
and I don't think retrieve all random string from database into a set and then generated the random string is a good idea...
I found that System.currentTimeMillis() will generate a "random" number , but how to translate that number to a random string is a question...I need a string with length 8.
any suggestion will be appreciated
You can use Apache library for this: RandomStringUtils
RandomStringUtils.randomAlphanumeric(8).toUpperCase() // for alphanumeric
RandomStringUtils.randomAlphabetic(8).toUpperCase() // for pure alphabets
randomAlphabetic(int count)
Creates a random string whose length is the number of characters specified.
randomAlphanumeric(int count)
Creates a random string whose length is the number of characters specified.
So there are two issues here - creating the random string, and making sure there's no duplicate already in the db.
If you are not bound to 8 characters, you can use a UUID as the commenter above suggested. The UUID class returns a strong that is highly statistically unlikely to be a duplicate of a previously generated UUID so you can use it for this precise purpose without checking if its already in your database.
UUID.randomUUID().toString();
Or if you don't care whether what the unique id is as long as its unique you could use an identity or autoincrement field which pretty much all DB's support. If you do that, though you have the read the record after you commit it to get the identity assigned by the db.
which produces a string which looks something that looks like this:
5e0013fd-3ed4-41b4-b05d-0cdf4324bb19
If you are have to have an 8 character string as your unique id and you don't want to import the apache library, \you can generate random 8 character string like this:
final String alpha="ABCDEFGHIJKLMNOPQRSTUVWXYZ";
final Random rand= new Random();
public String myUID() {
int i = 8;
String uid="";
while (i-- > 0) {
uid+=alpha.charAt(rand.nextInt(26));
}
return uid;
}
To make sure its not a duplicate, you should add a unique index to the column in the db which contains it.
You can either query the db first to make sure that no row has that id before you insert the row, or catch the exception and retry if you've generated a duplicate.
Method currentTimeMillis() returns the current time in milliseconds in long so convert long to string, and s.substring(5, s.length()) give you last 8 digit's of milliseconds those are always identical for each millisecond.
public static void main(String[] args) {
String s = String.valueOf(System.currentTimeMillis());
System.out.println(s.substring(5, s.length()));
}
You have to make sure that this string is available or not in your database each time.
I am writing a class that when called will call a method to use system time to generate a unique 8 character alphanumeric as a reference ID. But I have the fear that at some point, multiple calls might be made in the same millisecond, resulting in the same reference ID. How can I go about protecting this call to system time from multiple threads that might call this method simultaneously?
System time is unreliable source for Unique Ids. That's it. Don't use it.
You need some form of a permanent source (UUID uses secure random which seed is provided by the OS)
The system time may go/jump backwards even a few milliseconds and screw your logic entirely. If you can tolerate 64 bits only you can either use High/Low generator which is a very good compromise or cook your own recipe: like 18bits of days since beginning of 2012 (you have over 700years to go) and then 46bits of randomness coming from SecureRandom - not the best case and technically it may fail but it doesn't require external persistence.
I'd suggest to add the threadID to the reference ID. This will make the reference more unique. However, even within a thread consecutive calls to a time source may deliver identical values. Even calls to the highest resolution source (QueryPerformanceCounter) may result in identical values on certain hardware. A possible solution to this problem is testing the collected time value against its predecessor and add an increment item to the "time-stamp". You may need more than 8 characters when this should be human readable.
The most efficient source for a timestamp is the GetSystemTimeAsFileTime API. I wrote some details in this answer.
You can use the UUID class to generate the bits for your ID, then use some bitwise operators and Long.toString to convert it to base-36 (alpha-numeric).
public static String getId() {
UUID uuid = UUID.randomUUID();
// This is the time-based long, and is predictable
long msb = uuid.getMostSignificantBits();
// This contains the variant bits, and is random
long lsb = uuid.getLeastSignificantBits();
long result = msb ^ lsb; // XOR
String encoded = Long.toString(result, 36);
// Remove sign if negative
if (result < 0)
encoded = encoded.substring(1, encoded.length());
// Trim extra digits or pad with zeroes
if (encoded.length() > 8) {
encoded = encoded.substring(encoded.length() - 8, encoded.length());
}
while (encoded.length() < 8) {
encoded = "0" + encoded;
}
}
Since your character space is still smaller compared to UUID, this isn't foolproof. Test it with this code:
public static void main(String[] args) {
Set<String> ids = new HashSet<String>();
int count = 0;
for (int i = 0; i < 100000; i++) {
if (!ids.add(getId())) {
count++;
}
}
System.out.println(count + " duplicate(s)");
}
For 100,000 IDs, the code performs well pretty consistently and is very fast. I start getting duplicate IDs when I increase another order of magnitude to 1,000,000. I modified the trimming to take the end of the encoded string instead of the beginning, and this greatly improved duplicate ID rates. Now having 1,000,000 IDs isn't producing any duplicates for me.
Your best bet may still be to use a synchronized counter like AtomicInteger or AtomicLong and encode the number from that in base-36 using the code above, especially if you plan on having lots of IDs.
Edit: Counter approach, in case you want it:
private final AtomicLong counter;
public IdGenerator(int start) {
// start could also be initialized from a file or other
// external source that stores the most recently used ID
counter = new AtomicLong(start);
}
public String getId() {
long result = counter.getAndIncrement();
String encoded = Long.toString(result, 36);
// Remove sign if negative
if (result < 0)
encoded = encoded.substring(1, encoded.length());
// Trim extra digits or pad with zeroes
if (encoded.length() > 8) {
encoded = encoded.substring(0, 8);
}
while (encoded.length() < 8) {
encoded = "0" + encoded;
}
}
This code is thread-safe and can be accessed concurrently.
I need to implement global object collecting statistics for web server. I have Statistics singleton, which has method addSample(long sample), which subsequently call updateMax. This has to be obviously thread-safe. I have this method for updating maximum of whole Statistics:
AtomicLong max;
private void updateMax(long sample) {
while (true) {
long curMax = max.get();
if (curMax < sample) {
boolean result = max.compareAndSet(curMax, sample);
if (result) break;
} else {
break;
}
}
}
Is this implementation correct? I am using java.util.concurrent, because I believe it would be faster than simple synchronized. Is there some other / better way to implement this?
As of Java 8, LongAccumulator has been introduced.
It is advised as
This class is usually preferable to AtomicLong when multiple threads
update a common value that is used for purposes such as collecting
statistics, not for fine-grained synchronization control. Under low
update contention, the two classes have similar characteristics. But
under high contention, expected throughput of this class is
significantly higher, at the expense of higher space consumption.
You can use it as follows:
LongAccumulator maxId = new LongAccumulator(Long::max, 0); //replace 0 with desired initial value
maxId.accumulate(newValue); //from each thread
I think it's correct, but I'd probably rewrite it a little for clarity, and definitely add comments:
private void updateMax(long sample) {
while (true) {
long curMax = max.get();
if (curMax >= sample) {
// Current max is higher, so whatever other threads are
// doing, our current sample can't change max.
break;
}
// Try updating the max value, but only if it's equal to the
// one we've just seen. We don't want to overwrite a potentially
// higher value which has been set since our "get" call.
boolean setSuccessful = max.compareAndSet(curMax, sample);
if (setSuccessful) {
// We managed to update the max value; no other threads
// got in there first. We're definitely done.
break;
}
// Another thread updated the max value between our get and
// compareAndSet calls. Our sample can still be higher than the
// new value though - go round and try again.
}
}
EDIT: Usually I'd at least try the synchronized version first, and only go for this sort of lock-free code when I'd found that it was causing a problem.
With Java 8 you can take advantage of functional interfaces and a simple lamda expression to solve this with one line and no looping:
private void updateMax(long sample) {
max.updateAndGet(curMax -> (sample > curMax) ? sample : curMax);
}
The solution uses the updateAndGet(LongUnaryOperator) method. The current value is contained in curMax and using the conditional operator a simple test is performed replacing the current max value with the sample value if the sample value is greater than the current max value.
as if you didn't have your pick of answers, here's mine:
// while the update appears bigger than the atomic, try to update the atomic.
private void max(AtomicDouble atomicDouble, double update) {
double expect = atomicDouble.get();
while (update > expect) {
atomicDouble.weakCompareAndSet(expect, update);
expect = atomicDouble.get();
}
}
it's more or less the same as the accepted answer, but doesn't use break or while(true) which I personally don't like.
EDIT: just discovered DoubleAccumulator in java 8. the documentation even says this is for summary statistics problems like yours:
DoubleAccumulator max = new DoubleAccumulator(Double::max, Double.NEGATIVE_INFINITY);
parallelStream.forEach(max::accumulate);
max.get();
I believe what you did is correct, but this is a simpler version that I also think is correct.
private void updateMax(long sample){
//this takes care of the case where between the comparison and update steps, another thread updates the max
//For example:
//if the max value is set to a higher max value than the current value in between the comparison and update step
//sample will be the higher value from the other thread
//this means that the sample will now be higher than the current highest (as we just set it to the value passed into this function)
//on the next iteration of the while loop, we will update max to match the true max value
//we will then fail the while loop check, and be done with trying to update.
while(sample > max.get()){
sample = max.getAndSet(sample);
}
}