Handling Big Integers in low-latency applications

Handling Big Integers in low-latency applications - java

I'm writing an application in Java, and trying to get the best performance out of it. Currently it can handle 250,000 of a specific operation every second.
However, I discovered a bug. Due to the way this application works I have to take a number that is from the user input, that can be up to x,xxx,xxx,xxx and then to this I also have to add a timestamp in milliseconds.
Of course, I forgot about this and soon discovered that all of my values were negative.
Now, initially my thought was to just use BigIntegers, but will this not destroy the performance?
What is the best way to handle large integers in low latency applications.

There's no reason for BigInteger. I can see just 10 digits there, which means that it nearly fits in an int. A long gives you 9 more digits.
Look at Long.MAX_VALUE and similar constants so you know what you're doing. A millisecond timestamp in long will overflow on Sun Aug 17 07:12:55 GMT 292278994. That's not a typo.

you can use long for 10 digits.
Or, you can try java.math.BigInteger or java.math.BigDecimal.

Related

Can I use nanoTime instead of randomUUID?

I am writing a process that returns data to a subscribers every few seconds. I would like to create a unique id for to the subscribers:
producer -> subsriber1
-> subsriber2
What is the difference between using:
java.util.UUID.randomUUID()
System.nanoTime()
System.currentTimeMillis()
Will the nano time always be unique? What about the random UUID?

UUID
The 128-bit UUID was invented exactly for your purpose: Generating identifiers across one or more machines without coordinating through a central authority.
Ideally you would use the original Version 1 UUID, or its variations in Versions 2, 3, and 5. The original takes the MAC address of the host computer’s network interface and combines it with the current moment plus a small arbitrary number that increments when the host clock has been adjusted. This approach eliminates any practical concern for duplicates.
Java does not bundle an implementation for generating these Versions. I presume the Java designers had privacy and security concerns over divulging place, time, and MAC address.
Java comes with only one implementation of a generator, for Version 4. In this type all but 6 of the 128 bits are randomly generated. If a cryptographically strong random generator is used, this Version is good enough to use in most common situations without concern for collisions.
Understand that 122 bits is a really big range of numbers (5.316911983139664e+36). 64-bits yields a range of 18,446,744,073,709,552,000 (18 quintillion). The remaining 58 bits (122-64=58) yields a number range of 288,230,376,151,711,740 (288 quadrillion). Now multiply those two numbers to get the range of 122-bits: 2^122 = ( 18,446,744,073,709,552,000 * 288,230,376,151,711,740 ) which is 5.3 undecillion.
Nevertheless, if you have access to generating a Version of UUID other than 4, take it. For example in a database system such as Postgres, the database server can generate UUID numbers in the various Versions including Version 1. Or you may find a Java library for generating such UUIDs, though that library may not be platform-independent (it may have native code within).
System.nanoTime
Be clear that System.nanoTime has nothing to do with the current date and time. To quote the Javadoc:
This method can only be used to measure elapsed time and is not related to any other notion of system or wall-clock time.
The System.nanoTime feature simply returns a long number, a count of nanoseconds since some origin, but that origin is not specified.
The only promise made in the Java spec is that the origin will not change during the runtime of a JVM. So you know the number is ever increasing during execution of your app. Unless reaching the limit of a long, when the counter will rollover. That rollover might take 292 years (2^63 nanoseconds), if the origin is zero — but, again, the origin is not specified.
In my experience with the particular Java implementations I have used, the origin is the moment when the JVM starts up. This means I will most certainly see the same numbers all over again after the next JVM restart.
So using System.nanoTime as an identifier is a poor choice. Whether your app happens to hit coincidentally the exact same nanosecond number as seen in a prior run is pure chance, but a chance you need not take. Use UUID instead.

java.util.UUID.randomUUID() is potentially thread-safe.
It is not safe to compare the results of System.nanoTime() calls between different threads. If many threads run during the same millisecond, this function returns the same milliseconds.
The same is true for System.currentTimeMillis() also.
Comparing System.currentTimeMillis() and System.nanoTime(), the latter is more expensive as it takes more cpu cycles but is more accurate too. So UUID should serve your purpose.

I think yes, you can use System.nanoTime() as id. I have tested it and did not face with duplication.
P.S. But I strongly offer you to use UUID.

Java Time period in decimal number of years

If I calculate the difference between 2 LocalDate's in java.time using:
Period p = Period.between(testDate, today);
Then I get an output with the number of years, months, days like:
Days = 9
Months = 6
Years = 18
Does anyone know a clean way to represent that as a decimal type value (ie, above would be something around 18.5...)?

You mentioned in one of your comments that you need quarter year precision if you need the current quarter you can use IsoFields.QUARTER_YEARS:
double yearAndQuarter = testDate.until(today, IsoFields.QUARTER_YEARS) / 4.0;
This way you will actually use the time api, always get the correct result and #Mike won't have to loathe anything.

Please do not do this.
Representing the difference between two dates as a 'number of years' multiplier is problematic because the average length of a year between two dates is dependent on which dates you are comparing. It's easy to get this wrong, and it's much harder to come up with all the test cases necessary to prove you got it right.
Most programmers should never perform date/time calculations manually. You are virtually guaranteed to get it wrong. Seriously, there are so many ways things can go horribly wrong. Only a handful of programmers on the planet fully understand the many subtleties involved. The fact that you are asking this question proves that you are not one of them, and that's okay--neither am I. You, along with the vast majority of us, should rely on a solid Date/Time API like java.util.time.
If you really need a single numeric value, then the safest option I can think of is to use the number of days, because the LocalDate API can calculate that number for you:
long differenceInDays = testDate.until(today, ChronoUnit.DAYS)
Note that this difference is only valid for the two dates used to produce it. The round-trip conversion is straightforward:
LocalDate today = testDate.plus(differenceInDays, ChronoUnit.DAYS)
Do not attempt to manually convert a Period with year, month, and day components into a whole number of days. The correct answer depends on the dates involved, which is why we want to let the LocalDate API calculate it for us.
When precision isn't important
Based on your comments, precision isn't an issue for you, because you only want to display someone's age to the nearest quarter-year or so. You aren't trying to represent an exact difference in time; only an approximate one, with a rather large margin for error. You also don't need to be able to perform any round-trip calculations. This changes things considerably.
An approximation like #VGR's should be more than adequate for these purposes: the 'number of years' should be accurate to within 3 days (< 0.01 years) unless people start living hundreds of thousands of years, in which case you can switch to double ;).
#Oleg's approach also works quite well, and will give you a date difference in whole quarters, which you can divide by 4 to convert to years. This is probably the easiest solution to get right, as you won't need to round or truncate the result. This is, I think, the closest you will get to a direct solution from java.util.time. The Java Time API (and date/time APIs in general) are designed for correctness: they'll give you whole units, but they usually avoid giving you fractional approximations due to the inherent error involved in floating-point types (there are exceptions, like .NET's System.TimeSpan).
However, if your goal is to present someone's age for human users, and you want greater precision than whole years, I think 18 years, 9 months (or an abbreviated form like 18 yr, 9 mo) is a better choice than 18.75 years.

I would avoid using Period, and instead just calculate the difference in days:
float years = testDate1.until(today, ChronoUnit.DAYS) / 365.2425f;

Working beyond Max Value (Java)

I have a problem that requires me to go well beyond the Max Value of Integers in Java. How can I have an exception so that Java will let me do this, or get around these rules of Java?

Use a long value or Long wrapper.
When you've reached Long.MAX_VALUE then use BigInteger, which has a very large capacity, at least from -2^Integer.MAX_VALUE to 2^Integer.MAX_VALUE.
Come back for additional tips when BigInteger is no longer enough.

Is there an infinite Duration in Java 8 equivalent to the .NET Timeout.InfiniteTimeSpan Field?

Everything is in the title:
Is there an infinite Duration in Java 8 equivalent to the C# Timeout.InfiniteTimeSpan Field?
A bit like:
https://msdn.microsoft.com/en-us/library/system.threading.timeout.infinitetimespan(v=vs.110).aspx
I don't think -1 ms is understood across the all java libraries as an infinite timespan, so it might be more a problem of definition.
In order to clarify a bit the context, let's say I want to make a thread asleep for an infinite amount of time without performing an infinite loop, note that this not necessarily a realistic practical use though.
I'm just wondering is there anything built-in in the Java libraries?

As an extension to #Misha's answer, this is essentially the largest duration value allowed:
public static final Duration MAX_DURATION = Duration.ofSeconds(
Long.MAX_VALUE, // Max allowed seconds
999999999L // Max nanoseconds less than a second
);
Anything more than this leads to
java.lang.ArithmeticException: long overflow

From Duration javadoc:
A physical duration could be of infinite length. For practicality, the duration is stored with constraints similar to Instant. The duration uses nanosecond resolution with a maximum value of the seconds that can be held in a long. This is greater than the current estimated age of the universe.
You certainly don't need to do an infinite loop to suspend a thread. Consider LockSupport.park() or another one of the many available mechanisms in java.util.concurrent. Can you describe your problem in more detail?

Numeric Range Query

I read that for handling date range query NumericRangeQuery is better than TermRangeQuery in "Lucene in action", But i couldnot find the reason. i want to know the reason behind it.
I used TermRangeQuery and NumericRangequery both for handling date range query and i found that searching is fast via NumericRangeQuery.
My second point is to query using NumericRangeQuery i have to create indexes using NumericField by which i can create indexes upto milisecond but what if i want to reduce my resolution upto hour or day.

Why is numeric so much faster than term?
As you have noted, there is a "precision step". This means that numbers are only stored to a certain precision, which means that there is a (very) limited number of terms. According to the documentation, it is rare to have more than 300 terms in an index. Check out the wikipedia article on Tries if you are interested in the theory.
How can you reduce precision?
The NumericField class has a "precision" parameter in the constructor. Note that the range query also has a precision parameter, and they must be the same. That JavaDoc page has a link to a paper written about the implementation explaining more of what precision means.

Explanation by #Xodarap about Numeric field is correct. Essentially, the precision is dropped for the numbers to reduce the actual term space. Also, I suppose, TermRangeQuery uses String comparison whereas NumericRange query is working with integers. That should squeeze some more performance.
You can index at any desirable resolution - millisecond to day. Date.getTime() gives you milliseconds since epoch. You can divide this number by 1000 to get time with resolution at second. Or you can divide by 60,000 to get resolution at minute. And so on.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.