Can I use nanoTime instead of randomUUID? - java

I am writing a process that returns data to a subscribers every few seconds. I would like to create a unique id for to the subscribers:
producer -> subsriber1
-> subsriber2
What is the difference between using:
java.util.UUID.randomUUID()
System.nanoTime()
System.currentTimeMillis()
Will the nano time always be unique? What about the random UUID?

UUID
The 128-bit UUID was invented exactly for your purpose: Generating identifiers across one or more machines without coordinating through a central authority.
Ideally you would use the original Version 1 UUID, or its variations in Versions 2, 3, and 5. The original takes the MAC address of the host computer’s network interface and combines it with the current moment plus a small arbitrary number that increments when the host clock has been adjusted. This approach eliminates any practical concern for duplicates.
Java does not bundle an implementation for generating these Versions. I presume the Java designers had privacy and security concerns over divulging place, time, and MAC address.
Java comes with only one implementation of a generator, for Version 4. In this type all but 6 of the 128 bits are randomly generated. If a cryptographically strong random generator is used, this Version is good enough to use in most common situations without concern for collisions.
Understand that 122 bits is a really big range of numbers (5.316911983139664e+36). 64-bits yields a range of 18,446,744,073,709,552,000 (18 quintillion). The remaining 58 bits (122-64=58) yields a number range of 288,230,376,151,711,740 (288 quadrillion). Now multiply those two numbers to get the range of 122-bits: 2^122 = ( 18,446,744,073,709,552,000 * 288,230,376,151,711,740 ) which is 5.3 undecillion.
Nevertheless, if you have access to generating a Version of UUID other than 4, take it. For example in a database system such as Postgres, the database server can generate UUID numbers in the various Versions including Version 1. Or you may find a Java library for generating such UUIDs, though that library may not be platform-independent (it may have native code within).
System.nanoTime
Be clear that System.nanoTime has nothing to do with the current date and time. To quote the Javadoc:
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time.
The System.nanoTime feature simply returns a long number, a count of nanoseconds since some origin, but that origin is not specified.
The only promise made in the Java spec is that the origin will not change during the runtime of a JVM. So you know the number is ever increasing during execution of your app. Unless reaching the limit of a long, when the counter will rollover. That rollover might take 292 years (2^63 nanoseconds), if the origin is zero — but, again, the origin is not specified.
In my experience with the particular Java implementations I have used, the origin is the moment when the JVM starts up. This means I will most certainly see the same numbers all over again after the next JVM restart.
So using System.nanoTime as an identifier is a poor choice. Whether your app happens to hit coincidentally the exact same nanosecond number as seen in a prior run is pure chance, but a chance you need not take. Use UUID instead.

java.util.UUID.randomUUID() is potentially thread-safe.
It is not safe to compare the results of System.nanoTime() calls between different threads. If many threads run during the same millisecond, this function returns the same milliseconds.
The same is true for System.currentTimeMillis() also.
Comparing System.currentTimeMillis() and System.nanoTime(), the latter is more expensive as it takes more cpu cycles but is more accurate too. So UUID should serve your purpose.

I think yes, you can use System.nanoTime() as id. I have tested it and did not face with duplication.
P.S. But I strongly offer you to use UUID.

Related

Is there an infinite Duration in Java 8 equivalent to the .NET Timeout.InfiniteTimeSpan Field?

Everything is in the title:
Is there an infinite Duration in Java 8 equivalent to the C# Timeout.InfiniteTimeSpan Field?
A bit like:
https://msdn.microsoft.com/en-us/library/system.threading.timeout.infinitetimespan(v=vs.110).aspx
I don't think -1 ms is understood across the all java libraries as an infinite timespan, so it might be more a problem of definition.
In order to clarify a bit the context, let's say I want to make a thread asleep for an infinite amount of time without performing an infinite loop, note that this not necessarily a realistic practical use though.
I'm just wondering is there anything built-in in the Java libraries?
As an extension to #Misha's answer, this is essentially the largest duration value allowed:
public static final Duration MAX_DURATION = Duration.ofSeconds(
Long.MAX_VALUE, // Max allowed seconds
999999999L // Max nanoseconds less than a second
);
Anything more than this leads to
java.lang.ArithmeticException: long overflow
From Duration javadoc:
A physical duration could be of infinite length. For practicality, the duration is stored with constraints similar to Instant. The duration uses nanosecond resolution with a maximum value of the seconds that can be held in a long. This is greater than the current estimated age of the universe.
You certainly don't need to do an infinite loop to suspend a thread. Consider LockSupport.park() or another one of the many available mechanisms in java.util.concurrent. Can you describe your problem in more detail?

Why does the Java implementation of Random.setSeed xor the parameter with 0x5DEECE66DL?

See http://docs.oracle.com/javase/7/docs/api/java/util/Random.html#setSeed(long). The code xors seed with the multiplier before reducing it mod 2^48. Why not just reduce the passed seed mod 2^48? The C equivalent seed48 does not perform the xor.
A nice read you can find here : java.util.Random’s Magic Number 0x5DEECE66D.
and a quote:
The analysis says it was chosen simply because researchers determined empirically
that it produces a sequence of values satisfying various randomness tests
And this Document gives a shot at the Magic number too.
and One more Quote:
I then tried a search for the decimal value, excluding Java, and found
the answer in some class notes:
http://nut.bu.edu/~youssef/py502/monte_carlo_supplement.ps
http://www.inf.ethz.ch/personal/gaertner/texts/own_work/random_matrices.pdf
and in some computer documentation:
http://developer.apple.com/documentation/Darwin/Reference/ManPages/html/_rand48.3.html
The Youssef notes say:
... I can only say that 25214903917_LONG and 11_LONG have
apparently been chosen by passing a battery of such [meaning
Marsaglia's DIEHARD] tests.
... Even in the case of the 48-bit generators we are discussing
today, cas26 will generate them all in a month or two of CPU time
and then start to repeat.

Java "pool" of longs or Oracle sequence that reuses released values

Several months ago I implemented a solution to choose unique values from a range between 1 and 65535 (16 bits). This range is used to generate unique Route Targets suffixes, which for this customer massive network (it's a huge ISP) are a very disputed resource, so any released value needs to become immediately available to the end user.
To tackle this requirement I used a BitSet. Allocate a suffix on the RT index with set and deallocate a suffix with clear. The method nextClearBit() can find the next available value. I handle synchronization / concurrency issues manually.
This works pretty well for a small range... The entire index is small (around 10k), it is blazing fast and can be easy serialized and stored in a blob field.
The problem is, some new devices can handle RT suffixes of 32 bits unsigned (range 1 / 4294967296). Which can't be managed with a BitSet (it would, by itself, consume around 600Mb, plus be limited to int - 32 bits signed - range). Even with this massive range available, the client still wants to free Route Target suffixes that become available to the end user, mainly because the lowest ones (up to 65535) - which are compatible with old routers - are being heavily disputed.
Before I tell the customer that this is impossible and he will have to conform with my reusable index for lower RTs (up to 65550) and use a database sequence for the other ones (which means that when the user frees a Route Target, it will not become available again), would anyone shed some light?
Maybe some kind soul already implemented a high performance number pool for Java (6 if it matters), or I am missing a killer feature of Oracle database (11R2 if it matters)... Just some wishful thinking :).
Thank you very much in advance.
I would combine the following:
your current BitSet solution for 1-65535
PLUS
Oracle-based solution with a sequence for 65536 - 4294967296 which wraps around defined as
CREATE SEQUENCE MyIDSequence
MINVALUE 65536
MAXVALUE 4294967296
START WITH 65536
INCREMENT BY 1
CYCLE
NOCACHE
ORDER;
This sequence gives you ordered values in the specified range and allows for reuse of any values but only after the maximum is reached - which should allow enough time for the values being released... if need be you can keep track of values in use in a table and just increment further if the returned value is already in use - all this can be wrapped nicely into a stored procedure for convenience...
This project may be of some use.

Numeric Range Query

I read that for handling date range query NumericRangeQuery is better than TermRangeQuery in "Lucene in action", But i couldnot find the reason. i want to know the reason behind it.
I used TermRangeQuery and NumericRangequery both for handling date range query and i found that searching is fast via NumericRangeQuery.
My second point is to query using NumericRangeQuery i have to create indexes using NumericField by which i can create indexes upto milisecond but what if i want to reduce my resolution upto hour or day.
Why is numeric so much faster than term?
As you have noted, there is a "precision step". This means that numbers are only stored to a certain precision, which means that there is a (very) limited number of terms. According to the documentation, it is rare to have more than 300 terms in an index. Check out the wikipedia article on Tries if you are interested in the theory.
How can you reduce precision?
The NumericField class has a "precision" parameter in the constructor. Note that the range query also has a precision parameter, and they must be the same. That JavaDoc page has a link to a paper written about the implementation explaining more of what precision means.
Explanation by #Xodarap about Numeric field is correct. Essentially, the precision is dropped for the numbers to reduce the actual term space. Also, I suppose, TermRangeQuery uses String comparison whereas NumericRange query is working with integers. That should squeeze some more performance.
You can index at any desirable resolution - millisecond to day. Date.getTime() gives you milliseconds since epoch. You can divide this number by 1000 to get time with resolution at second. Or you can divide by 60,000 to get resolution at minute. And so on.

faster Math.exp() via JNI?

I need to calculate Math.exp() from java very frequently, is it possible to get a native version to run faster than java's Math.exp()??
I tried just jni + C, but it's slower than just plain java.
This has already been requested several times (see e.g. here). Here is an approximation to Math.exp(), copied from this blog posting:
public static double exp(double val) {
final long tmp = (long) (1512775 * val + (1072693248 - 60801));
return Double.longBitsToDouble(tmp << 32);
}
It is basically the same as a lookup table with 2048 entries and linear interpolation between the entries, but all this with IEEE floating point tricks. Its 5 times faster than Math.exp() on my machine, but this can vary drastically if you compile with -server.
+1 to writing your own exp() implementation. That is, if this is really a bottle-neck in your application. If you can deal with a little inaccuracy, there are a number of extremely efficient exponent estimation algorithms out there, some of them dating back centuries. As I understand it, Java's exp() implementation is fairly slow, even for algorithms which must return "exact" results.
Oh, and don't be afraid to write that exp() implementation in pure-Java. JNI has a lot of overhead, and the JVM is able to optimize bytecode at runtime sometimes even beyond what C/C++ is able to achieve.
Use Java's.
Also, cache results of the exp and then you can look up the answer faster than calculating them again.
You'd want to wrap whatever loop's calling Math.exp() in C as well. Otherwise, the overhead of marshalling between Java and C will overwhelm any performance advantage.
You might be able to get it to run faster if you do them in batches. Making a JNI call adds overhead, so you don't want to do it for each exp() you need to calculate. I'd try passing an array of 100 values and getting the results to see if it helps performance.
The real question is, has this become a bottle neck for you? Have you profiled your application and found this to be a major cause of slow down? If not, I would recommend using Java's version. Try not to pre-optimize as this will just cause development slow down. You may spend an extended amount of time on a problem that may not be a problem.
That being said, I think your test gave you your answer. If jni + C is slower, use java's version.
Commons Math3 ships with an optimized version: FastMath.exp(double x). It did speed up my code significantly.
Fabien ran some tests and found out that it was almost twice as fast as Math.exp():
0.75s for Math.exp sum=1.7182816693332244E7
0.40s for FastMath.exp sum=1.7182816693332244E7
Here is the javadoc:
Computes exp(x), function result is nearly rounded. It will be correctly rounded to the theoretical value for 99.9% of input values, otherwise it will have a 1 UPL error.
Method:
Lookup intVal = exp(int(x))
Lookup fracVal = exp(int(x-int(x) / 1024.0) * 1024.0 );
Compute z as the exponential of the remaining bits by a polynomial minus one
exp(x) = intVal * fracVal * (1 + z)
Accuracy: Calculation is done with 63 bits of precision, so result should be correctly rounded for 99.9% of input values, with less than 1 ULP error otherwise.
Since the Java code will get compiled to native code with the just-in-time (JIT) compiler, there's really no reason to use JNI to call native code.
Also, you shouldn't cache the results of a method where the input parameters are floating-point real numbers. The gains obtained in time will be very much lost in amount of space used.
The problem with using JNI is the overhead involved in making the call to JNI. The Java virtual machine is pretty optimized these days, and calls to the built-in Math.exp() are automatically optimized to call straight through to the C exp() function, and they might even be optimized into straight x87 floating-point assembly instructions.
There's simply an overhead associated with using the JNI, see also:
http://java.sun.com/docs/books/performance/1st_edition/html/JPNativeCode.fm.html
So as others have suggested try to collate operations that would involve using the JNI.
Write your own, tailored to your needs.
For instance, if all your exponents are of the power of two, you can use bit-shifting. If you work with a limited range or set of values, you can use look-up tables. If you don't need pin-point precision, you use an imprecise, but faster, algorithm.
There is a cost associated with calling across the JNI boundary.
If you could move the loop that calls exp() into the native code as well, so that there is just one native call, then you might get better results, but I doubt it will be significantly faster than the pure Java solution.
I don't know the details of your application, but if you have a fairly limited set of possible arguments for the call, you could use a pre-computed look-up table to make your Java code faster.
There are faster algorithms for exp depending on what your'e trying to accomplish. Is the problem space restricted to a certain range, do you only need a certain resolution, precision, or accuracy, etc.
If you define your problem very well, you may find that you can use a table with interpolation, for instance, which will blow nearly any other algorithm out of the water.
What constraints can you apply to exp to gain that performance trade-off?
-Adam
I run a fitting algorithm and the minimum error of the fitting result is way larger
than the precision of the Math.exp().
Transcendental functions are always much more slower than addition or multiplication and a well-known bottleneck. If you know that your values are in a narrow range, you can simply build a lookup-table (Two sorted array ; one input, one output). Use Arrays.binarySearch to find the correct index and interpolate value with the elements at [index] and [index+1].
Another method is to split the number. Lets take e.g. 3.81 and split that in 3+0.81.
Now you multiply e = 2.718 three times and get 20.08.
Now to 0.81. All values between 0 and 1 converge fast with the well-known exponential series
1+x+x^2/2+x^3/6+x^4/24.... etc.
Take as much terms as you need for precision; unfortunately it's slower if x approaches 1. Lets say you go to x^4, then you get 2.2445 instead of the correct 2.2448
Then multiply the result 2.781^3 = 20.08 with 2.781^0.81 = 2.2445 and you have the result
45.07 with an error of one part of two thousand (correct: 45.15).
It might not be relevant any more, but just so you know, in the newest releases of the OpenJDK (see here), Math.exp should be made an intrinsic (if you don't know what that is, check here).
This will make performance unbeatable on most architectures, because it means the Hotspot VM will replace the call to Math.exp by a processor-specific implementation of exp at runtime. You can never beat these calls, as they are optimized for the architecture...

Categories

Resources