In my application I'm going to use floating point values to store geographical coordinates (latitude and longitude).
I know that the integer part of these values will be in range [-90, 90] and [-180, 180] respectively. Also I have requirement to enforce some fixed precision on these values (for now it is 0.00001 but can be changed later).
After studying single precision floating point type (float) I can see that it is just a little bit small to contain my values. That's because 180 * 10^5 is greater than 2^24 (size of the significand of float) but less than 2^25.
So I have to use double. But the problem is that I'm going to store huge amounts of this values, so I don't want to waste bytes, storing unnecessary precision.
So how can I perform some sort of compression when converting my double value (with fixed integer part range and specified precision X) to byte array in java? So for example if I use precision from my example (0.00001) I end up with 5 bytes for each value.
I'm looking for a lightweight algorithm or solution so that it doesn't imply huge calculations.
To store a number x to a fixed precision of (for instance) 0.00001, just store the integer closest to 100000 * x. (By the way, this requires 26 bits, not 25, because you need to store negative numbers too.)
As TonyK said in his answer, use an int to store the numbers.
To compress the numbers further, use locality: Geo coordinates are often "clumped" (say the outline of a city block). Use a fixed reference point (full 2x26 bits resolution) and then store offsets to the last coordinate as bytes (gives you +/-0.00127). Alternatively, use short which gives you more than half the value range.
Just be sure to hide the compression/decompression in a class which only offers double as outside API, so you can adjust the precision and the compression algorithm at any time.
Considering your use case, i would nonetheless use double and compress them directly.
The reason is that strong compressors, such as 7zip, are extremely good at handling "structured" data, which an array of double is (one data = 8 bytes, this is very regular & predictable).
Any other optimisation you may come up "by hand" is likely to be inferior or offer negligible advantage, while simultaneously costing you time and risks.
Note that you can still apply the "trick" of converting the double into int before compression, but i'm really unsure if it would bring you tangible benefit, while on the other hand it would seriously reduce your ability to cope with unforeseen ranges of figures in the future.
[Edit] Depending on source data, if "lower than precision level" bits are "noisy", it can be usefull for compression ratio to remove the noisy bits, either by rounding the value or even directly applying a mask on lowest bits (i guess this last method will not please purists, but at least you can directly select your precision level this way, while keeping available the full range of possible values).
So, to summarize, i'd suggest direct LZMA compression on your array of double.
Related
I'm just starting to learn to code and this might be a very simple question but I have seen double used for numbers that are larger than int can hold. If I understand correctly, double is less precise than using long might be.
So if I have a number larger than int can hold, would it be best to use double or long? In what cases is one preferred over the other? What is best practice for this?
(Say I would like to store a variable with Earth's population. Is double or long preferred?)
tl;dr
For population of humans, use long primitive or Long class.
Details
The floating-point types trade away accuracy in exchange for speed of execution. These types in Java include float/Float and double/Double.
If working with whole numbers (no fractions):
For smaller numbers ranging from -2^31 to 2^31-1 (roughly plus or minus 2 billion , use int/Integer. Needs 32-bits for content.
For larger numbers, use long/Long. Needs 64-bits for content.
For extremely large numbers, use BigInteger.
If working with fractional numbers (not integers):
For numbers between (2-2^-23) * 2^127 and 2^-149 where you do not care about accuracy, use float/Float. Needs 32-bits for content.
For larger/smaller numbers where you do not care about accuracy, use double/Double. Needs 64-bits for content.
For more extreme numbers, use BigDecimal.
If you care about accuracy (such as money matters), use BigDecimal.
So for the earth’s population of humans, use long/Long.
Let's say, using java, I type
double number;
If I need to use very big or very small values, how accurate can they be?
I tried to read how doubles and floats work, but I don't really get it.
For my term project in intro to programming, I might need to use different numbers with big ranges of value (many orders of magnitude).
Let's say I create a while loop,
while (number[i-1] - number[i] > ERROR) {
//does stuff
}
Does the limitation of ERROR depend on the size of number[i]? If so, how can I determine how small can ERROR be in order to quit the loop?
I know my teacher explained it at some point, but I can't seem to find it in my notes.
Does the limitation of ERROR depend on the size of number[i]?
Yes.
If so, how can I determine how small can ERROR be in order to quit the loop?
You can get the "next largest" double using Math.nextUp (or the "next smallest" using Math.nextDown), e.g.
double nextLargest = Math.nextUp(number[i-1]);
double difference = nextLargest - number[i-1];
As Radiodef points out, you can also get the difference directly using Math.ulp:
double difference = Math.ulp(number[i-1]);
(but I don't think there's an equivalent method for "next smallest")
If you don't tell us what you want to use it for, then we cannot answer anything more than what is standard knowledge: a double in java has about 16 significant digits, (that's digits of the decimal numbering system,) and the smallest possible value is 4.9 x 10-324. That's in all likelihood far higher precision than you will need.
The epsilon value (what you call "ERROR") in your question varies depending on your calculations, so there is no standard answer for it, but if you are using doubles for simple stuff as opposed to highly demanding scientific stuff, just use something like 1 x 10-9 and you will be fine.
Both the float and double primitive types are limited in terms of the amount of data they can store. However, if you want to know the maximum values of the two types, then run the code below with your favourite IDE.
System.out.println(Float.MAX_VALUE);
System.out.println(Double.MAX_VALUE);
double data type is a double-precision 64-bit IEEE 754 floating point (digits of precision could be between 15 to 17 decimal digits).
float data type is a single-precision 32-bit IEEE 754 floating point (digits of precision could be between 6 to 9 decimal digits).
After running the code above, if you're not satisfied with their ranges than I would recommend using BigDecimal as this type doesn't have a limit (rather your RAM is the limit).
I need to read Adobe's signed 32-bit fixed-point number, with 8 bits for the integer part followed by 24 bits for the fractional part. This is a "path point", as defined in Adobe Photoshop File Formats Specification.
This is how I'd do it in Ruby, but I need to do it in Java.
read(1).unpack('c*')[0].to_f +
(read(3).unpack('B*')[0].to_i(2).to_f / (2 ** 24)).to_f
Based on some discussion in the comments of David Wallace's answer, there is a significantly faster way to compute the correct double.
Probably the very fastest way to convert it is like this:
double intFixedPoint24ToDouble(int bits){
return ((double) bits) / 1<<24;
}
The reason that this is faster is because of the way double-precision floating point arithmatic works. In this case, the above sequence can be converted to some extremely simple additions and bit shifting. When this gets run, the actual steps it takes look like this:
Convert an int (bits) to a double (done on FPU, usually). This is quite fast.
Subtract 0x00180000 from the upper 32 bits of that result. This is extremely fast.
A very similar optimization can be applied whenever you multiply or divide any floating point number by any compile-time constant integer that is a power of two.
This compiler optimization does not apply if you are instead dividing by a double, or if you divide by a non-compile-time-constant expression (any expression involving anything other than final compile-time-constant variables, literal numbers, or operators). In that case, it must be performed as a double-precision floating-point division, which is probably the slowest single operation, except for block transfers and advanced mathematical functions.
However, as you can see, 1<<24 is a compile-time constant power of two, and so the optimization does apply in this case.
Read your bytes into an int (which is always 32 bits in Java), then use this. You need double, not float, because single precision floating point won't necessarily be long enough to hold your 32 bit fixed point number.
double toFixedPoint(int bytes){
return bytes / Math.pow(2, 24);
}
If speed is a concern, then work out Math.pow(2,24) outside of this method and store it.
I need to store an exact audio position in a database, namely SQLite. I could store the frame position (sample offset / channels) as an integer, but this would cause extra data maintenance in case of certain file conversions.
So I'm thinking about storing the position as an 8 byte real value in seconds, that is a double, and so as a REAL in SQLite. That makes the database structure more consistent.
But, given a maximum samplerate of 192kHz, is the double precision sufficient so that I can always recover the exact frame position when multiplying the value by the samplerate?
Is there a certain maximum position above which an error may occur? What is this maximum position?
PS: this is about SQLite REAL, but also about the C and Java double type which may hold the position value at various stages.
Update:
Since the discussions now focus on the risks related to conversion and rounding, here's the C method that I'm planning to use:
// Given these types:
int samplerate;
long long framepos;
double position;
// First compute the position in seconds from the framepos:
position = (double) framepos / samplerate;
// Now store the position in an SQLite REAL column, and retrieve it later
// Then compute the framepos back from position, with rounding:
framepos = position * samplerate + 0.5;
Is this safe and symmetrical?
A double has 51 bits worth of precision. Depending on the exponent part, some of these bits will represent whole numbers (seconds in your case), the others fractions of seconds.
At 48 kilobits, a minimum of 16 bits is required to get the sub-second precise enough (more if rounding is not optimal). That leaves 35 bits for the seconds, which will span just over a thousand years.
So even if you need an extra bit or two for the sub-second to guard against rounding, and even if SQL loses a bit or two of precision converting it to decimal and back here and there, you aren't anywhere near losing sample precision with your double precision number. Make sure your rounding works correctly - C tends to always round down on convert to integer, so even an infintessimaly small conversion error could throw you off by 1.
I would store it as a (64-bit) integer representing microseconds (approx 2**20). This avoids floating point hardware/software, is readily understood by all, and gives you a range of 0..2**44 seconds which is a little over 55 thousand years.
As an alternative, use a readable fixed precision decimal representation (20 digits should be enough). Right-justified with leading zeros. The cost of conversion is negligible compared to DB accesses anyway.
One advantage of these options is that any database will trivially know how to order them, not necessarily obvious for floating point values.
As the answer by Matthias Wandel explains, there's probably nothing to worry about. OTOH by using integers you would get fixed precision regardless of the magnitude which might be useful.
Say, use a 64-bit integer, and store the time as microseconds. That gives you an equivalent sampling precision of 1 MHz and a range of almost 300000 years (if my quick calculation is correct).
Edit Even when taking into account the need for the timestamp * sample_rate to fit into a 64-bit integer, you still have a range of 1.5 years (2**63/1e6/3600/24/365/192e3), assuming a max sample rate of 192kHz.
Can you help me clarify the usages of the float primitive in Java?
My understanding is that converting a float value to double and vice-versa can be problematic. I read (rather long time ago and not sure that it's actual true anymore with new JVMs) that float's performance is much worse than double's. And of course floats have less precision than double.
I also remember that when I worked with AWT and Swing I had some problems with using float or double (like using Point2D.Float or Point2D.Double).
So, I see only 2 advantages of float over double:
It needs only 4 bytes while double needs 8 bytes
The Java Memory Model (JMM) guarantees that assignment operation is atomic with float variables while it's not atomic with double's.
Are there any other cases where float is better then double? Do you use float in your applications?
The reason for including the float type is to some extent historic: it represents a standard IEEE floating point representation from the days when shaving 4 bytes off the size of a floating point number in return for extremely poor precision was a tradeoff worth making.
Nowadays, uses for float are pretty limited. But, for example, having the data type can make it easier to write code that needs interoperability with older systems that do use float.
As far as performance is concerned, I think the float and double are essentially identical except for the performance of divisions. Generally, whichever you use, the processor converts to its internal format, does the calculation, then converts back again, and the actual calculation effectively takes a fixed time. In the case of divisions, on Intel processors at least, as I recall the time taken to do a division is generally one clock cycle per 2 bits of precision, so that whether you use float or double does make a difference.
Unless you really really have a strong reason for using it, in new code, I would generally avoid 'float'.
Those two reasons you just gave are huge.
If you have a 3D volume that's 1k by 1k by 64, and then have many timepoints of that data, and then want to make a movie of maximum intensity projections, the fact that float is half the size of double could be the difference between finishing quickly and thrashing because you ran out of memory.
Atomicity is also huge, from a threading standpoint.
There's always going to be a tradeoff between speed/performance and accuracy. If you have a number that's smaller than 2^31 and an integer, then an integer is always a better representation of that number than a float, just because of the precision loss. You'll have to evaluate your needs and use the appropriate types for your problems.
I think you nailed it when you mention storage, with floats being half the size.
Using floats may show improved performance over doubles for applications processing large arrays of floating point numbers such that memory bandwith is the limiting factor. By switching to float[] from double[] and halving the data size, you effectively double the throughput, because twice as many values can be fetched in a given time. Although the cpu has a little more work to do converting the float to a double, this happens in parallel with the memory fetch, with the fetch taking longer.
For some applications the loss of precision might be worth trading for the gain in performance. Then again... :-)
So yes, advantages of floats:
Only requires 4 bytes
Atomic assignment
Arithmetic should be faster, especially on 32bit architectures, since there are specific float byte codes.
Ways to mitigate these when using doubles:
Buy more RAM, it's really cheap.
Use volatile doubles if you need atomic assignment.
Do tests, verify the performance of each, if one really is faster there isn't a lot you can do about it.
Someone else mentioned that this is similar to the short vs int argument, but it is not. All integer types (including boolean) except long are stored as 4 byte integers in the Java memory model unless stored in an array.
It is true that doubles might in some cases have faster operations than floats. However, this requires that everything fits in the L1-cache. With floats you can have twice as much in a cache-line. This can make some programs run almost twice as fast.
SSE instructions can also work with 4 floats in parallel instead of 2, but I doubt that the JIT actually uses those. I might be wrong though.