User-facing numbers: implementing them efficient, "operational" and round-off safe?

User-facing numbers: implementing them efficient, "operational" and round-off safe? - java

I have something similar to a spreadsheet column in mind. A spreadsheet column has transparent data typing: text or any kinds of numbers.
But no matter how the typing is implemented internally, they allow roundoff-safe operations; eg adding up a column of hundreds of numbers with decimal points, and other arithmetic operations. And they do it efficiently too.
What way of handling numbers can make them:
transparent to the user
round-off safe
support efficient arithmetic, aggregation, sorting
handled by datastores and applications with Java primitive types?
I have in mind, using a 64b long datatype that is internally multiplied by 1000 to provide 3 decimal places. For example 123.456 is internally stored as 123456, `1 is stored as 1000. Reinventing floating point numbers seems clunky; I have to reinvent multiplication, for example.
Miscellany: I actually have in mind a document tagging system. A number tag is conceptually similar to a spreadsheet column that is used to store numbers.
I do want to know how spreadsheets handle it, and I would have titled the question as such.
I am using two datastores that uses Java primitive types. Point #4 wasnt hypothetical.

Unless you really need to use primatives, BigDecimal should handle that for you.

Excel uses double precision floats internally, then rounds the display portion in each cell according to the formatting options. It uses the double values for any calculations (unless the Precision as Displayed option is enabled - in which case it uses the rounded displayed value) and then rounds the result when displayed.
You could certainly use a long normalized to the max number of decimals you want to support - but then you're stuck with fixed-precision. That may or may not be acceptable. If you can use BigDecimal, that could work - but I don't think that qualifies as a Java primitive type.

Related

Floating point types returned in ORM / DSL

The Java™ Tutorials state that "this data type [double] should never be used for precise values, such as currency." Is the fact that an ORM / DSL is returning floating point numbers for database columns storing values to be used to calculate monetary amounts a problem? I'm using QueryDSL and I'm dealing with money. QueryDSL is returning a Double for any number with a precision up to 16 and a BigDecimal thereafter. This concerns me as I'm aware that floating point arithmetic isn't suitable for currency calculations.
From this QueryDSL issue I'm led to believe that Hibernate does the same thing; see OracleDialect. Why does it use a Double rather than a BigDecimal? Is it safe to retrieve the Double and construct a BigDecimal, or is there a chance that a number with a precision of less than 16 could be incorrectly represented? Is it only when performing arithmetic operations that a Double can have floating-point issues, or are there values to which it cannot be accurately initialised?

Using floating point numbers for storing money is a bad idea indeed. Floating points can approximate an operation result, but that's not what you want when dealing with money.
The easiest way to fix it, in a database portable way, is to simply store cents. This is the proffered way of dealing with currency operations in financial operations. Pay attention that most databases use the half-away from zero rounding algorithm, so make sure that's appropriate in your context.
When it comes to money you should always ask a local accountant, especially for the rounding part. Better safe then sorry.
Now back to your questions:
Is it safe to retrieve the Double and construct a BigDecimal, or is
there a chance that a number with a precision of less than 16 could be
incorrectly represented?
This is a safe operation as long as your database uses at most a 16 digit precision. If it uses a higher precision, you'd need to override the OracleDialect and
Is it only when performing arithmetic operations that a Double can
have floating-point issues, or are there values to which it cannot be
accurately initialised?
When performing arithmetic operations you must always take into consideration the monetary rounding anyway, and that applies to BigDecimal as well. So if you can guarantee that the database value doesn't loose any decimal when being cast to a java Double, you are fine to create a BigDecimal from it. Using BigDecimal pays off when applying arithmetic operations to the database loaded value.
As for the threshold of 16, according to Wiki:
The 11 bit width of the exponent allows the representation of numbers
with a decimal exponent between 10−308 and 10308, with full 15–17
decimal digits precision. By compromising precision, subnormal
representation allows values smaller than 10−323.

There seems to be several concerns mentioned in the question, comments, and answers by Robert Bain. I've collected and paraphrased some of these.
Is it safe to use a double to store a precise value?
Yes, provided the number of significant-digits (precision) is small enough.
From wikipedia
If a decimal string with at most 15 significant digits is converted to IEEE 754 double precision representation and then converted back to a string with the same number of significant digits, then the final string should match the original.
But new BigDecimal(1000.1d) has the value 1000.1000000000000227373675443232059478759765625, why not 1000.1?
In the quote above I added emphasis - when converted from a double the number of significant digits must be specified, e.g.
new BigDecimal(1000.1d, new MathContext(15))
Is it safe to use a double for arbitrary arithmetic on precise values?
No, each intermediate value used in the calculation could introduce additional error.
Using a double to store exact values should be seen as an optimization. It introduces risk that if care is not taken, precision could be lost. Using a BigDecimal is much less likely to have unexpected consequences and should be your default choice.
Is it correct that QueryDSL returns a double for precise value?
It is not necessarily incorrect, but is probably not desirable. I would suggest you engage with the QueryDSL developers... but I see you have already raised an issue and they intend to change this behavior.

After much deliberation, I must conclude that the answer to my own question:
Is the fact that an ORM / DSL is returning floating point numbers for database columns storing values to be used to calculate monetary amounts a problem?
put simply, is yes. Please read on.
Is it safe to retrieve the Double and construct a BigDecimal, or is there a chance that a number with a precision of less than 16 could be incorrectly represented?
A number with a precision of less than 16 decimal digits is incorrectly represented in the following example.
BigDecimal foo = new BigDecimal(1000.1d);
The BigDecimal value of foo is 1000.1000000000000227373675443232059478759765625. 1000.1 has a precision of 1 and is being misrepresented from precision 14 of the BigDecimal value.
Is it only when performing arithmetic operations that a Double can have floating-point issues, or are there values to which it cannot be accurately initialised?
As per the example above, there are values to which it cannot be accurately initialised. As The Java™ Tutorials clearly states, "This data type [float / double] should never be used for precise values, such as currency. For that, you will need to use the java.math.BigDecimal class instead."
Interestingly, calling BigDecimal.valueOf(someDouble) appeared at first to magically resolve things but upon realising that it calls Double.toString() then reading Double's documentation it became apparent that this is not appropriate for exact values either.
In conclusion, when dealing with exact values, floating point numbers are never appropriate. As such, in my mind, ORMs / DSLs should be mapping to BigDecimal unless otherwise specified, given that most database use will involve the calculation of exact values.
Update:
Based on this conclusion, I've raised this issue with QueryDSL.

It is not only about arithmetic operations, but also about pure read&write.
Oracle NUMBER and BigDecimal do both use decadic base. So when you read number from database and then you store it back you can be sure, that the same number was written. (Unless it exceeds Oracle's limit of 38 digits).
If you convert NUMBER into binary base (Double) and then you convert it back do decadic then you might expect problems. And also this operation must be much slower.

How does a very small number behave while processing?

Well I am working on a big dataset and after some calculations I am getting values for the features like 4.4E-5. I read it somewhere those values means 0.000044 that is ten to the power minus 5. So my question is whenever I want to use them for further processing will these values behave same as float works or do I need some other data type?

Yes, it is an extended notation presenting the same binary floating point data type.
Both 4.4E-5 and 0.00044 are the same. And that value only approximates 0.000044 with a sum of powers of 2: 2^-18 + ...

Multiplying lots of small numbers leads to underflow. Take the log and add. This technique is universal in computer science. Many of the Google hits for "underflow log" are useful, including SO hits, other techniques for dealing with it, etc.

Java precise calculations - options to use

I am trying to establish some concise overview of what options for precise caluclations we have in JAVA+SQL. So far I have found following options:
use doubles accepting their drawbacks, no go.
use BigDecimals
using them in complicated formulas is problematic for me
use String.format/Decimal.format to round doubles
do i need to round each variable in formula or just result to get BigDecimal precision?
how can this be tweaked?
use computed fields option in SQL.
drawback is that I'd need dynamic SQL to pull data from different tables + calculate fields on other calculated fields and that would get messy
any other options?
Problem statement:
I need precise financial calculations that would involve using very big (billions) and very small numbers (0.0000004321), and also dividing values that are very similar to each other, so for sure I need precision of BigDecimal.
On the other side, I want to retain ease of use that doubles have in functions (i work on arrays from decimal SQL data), so calculations like: (a[i] - b[i])/b[i] etc. etc. that are further used in other calculations. and I'd like to have users to be able to desing their own formulas as they need them (using common math statements)
i am keen to use "formatting" solution for String.format, but this makes code not very readable ( using String.format() for each variable...).
Many thanks for suggestion of how to deal with the stuff.

There is nothing you can do to avoid floating point erros in float and double.
No free cheese here - use BigDecimal.

From Effective Java (2nd ED):
Item 48: Avoid float and double if exact answers are required
Float and double do not provide exact results and should not be used where exact results are required.
The float and double types are particularly ill-suited for monetary claculations because is impossible to represent 0.1 (or any other negative power of ten) as a float or double exactly.
The right way to solve this problem is to ouse BigDecimal, int, or long for monetary calculations.
...
An alternative is to use int or long and to keep track of the decimal point yourself.

There is no way to get BigDecimal precision on a double. doubles have double precision.
If you want to guarantee precise results use BigDecimal.
You could create your own variant using a long to store the integer part and an int to store the fractional part - but why reinvent the wheel.
Any time use doubles you stand to stuffer from double precision issues. If you use them in a single place you might as well use them everywhere.
Even if you only use them to represent data from the database then will round the data to double precision and you will lose information.

If I understand your question, you want to use Data Types with more precision than the native Java ones without loosing the simple mathematical syntax (e.g. / + * - and so on). As you cannot overload operators in Java, I think this is not possible.

Unusual floating point numbers in tables

I am building a system to read tables from heterogeneous documents and would like to know the best way of managing (columns of) floating point numbers. Where the column can be represented as real numbers I will use List<Double> (I'm using Java but experience from other languages would be useful.) I also wish to serialize the table as a CSV file. Thus a table might look like:
"material", "mass (g)", "volume (cm3)",
"iron", 7.8, 1.0,
"aluminium", 27.3, 9.9,
and column2 (1-based) would be represented by a List<Double>
{new Double(7.8), new Double(27.3)}
I may also wish to compute the density (mass/volume) and derive a new column ("density (g.cml-3)") as a List
{new Double(7.8), new Double(2.76)}
However the input values are sometimes missing, unusual or represented by fuzzy concepts. Some transformations may throw exceptions (which I would catch and replace by one of the above). Examples include:
1.0E+10000
>10
10 / 0.0 (i.e. divide by zero)
Math.sqrt(-1.)
Math.tan(Math.PI/2.0)
I have the following options in Java for unusual values of a list element
null reference
Double.NaN
Double.MAX_VALUE
Double.POSITIVE_INFINITY
Are there protocols for when the Java unusual values above should be used? I have read this question on how they behave. (I would like to rely on chaining of their operations). And if there are protocols can the values be serialized and read back in? (e.g. does Java parse "0x7ff0000000000000L" to a number equal to Double.POSITIVE_INFINITY
I am prepared for some loss of precision in specification (there are often errors in OCR, missing digits etc. so this is a "good enough" exercise).

You have three problems that you ought to separate to some extent:
What representation should you use for table entries, which might be numbers, numbered quantities of some units, or other things?
How might floating-point infinities and NaNs serve you?
How can floating-point objects be serialized (written to a file and read from a file)?
Regarding these:
You have not specified enough information here for good advice about how to represent table entries. From what you describe, there is no reason to use floating point at all. This is because you have not specified what operations you want to perform on the entries other than reading and writing them. If you do not need to do arithmetic, there is no reason to bother converting values to floating point, or to any other number-arithmetic system. You could simply maintain the entries as their original text. This makes serialization trivial.
Floating-point infinities act like mathematical infinity, by design. Infinity plus a number other than infinity remains infinity, et cetera. You should use floating-point infinities to represent mathematical infinities. You should avoid using floating-point infinities to represent overflows, unless you do not care about losing the values that overflow. Floating-point NaNs are intended to represent “not a number”. It is often used to represent something like “An error occurred, so we do not have a number here to give you. You should do something else in this place.” Then it is up to the application to supply the something else, perhaps by having supplementary information from another source or in a parallel data structure. Errors include things such as taking the square root of a negative number or failing to initialize some data. (E.g., some underlying software initializes floating-point data to NaNs, so that, if you do not initialize it yourself, NaNs remain.) You should generally treat NaNs as “empty places” that you must not use rather than as tokens representing something.
When writing and reading floating-point values, you should take care to convert the values exactly or ensure that the errors you introduce in conversion are tolerable. If you must convert to text (human-readable numerals) rather than writing in “binary” (bytes with arbitrary values), then it may be preferable to write in a notation that uses a numeric base compatible with the native radix of the floating-point system (e.g., hexadecimal floating-point numerals for binary floating-point representations, such as 0x3.4p-2 for .8125). If this is not feasible, then you need to produce enough digits (when converting to decimal) to represent the floating-point value accurately enough to recover the original value when reading it, and you need to ensure the conversion software converts without introducing additional errors. You must also handle special values such as infinities and NaNs.
(Note that Math.tan(Math.PI/2) is not infinity and does not cause an exception because Math.PI/2 is not exactly π/2, so its tangent is finite, not infinity.)

data type to represent a big decimal in java

Which data type is apt to represent a decimal number like "10364055.81".
If tried using double:
double d = 10364055.81;
But when I try to print the number, its displaying as "1.036405581E7", which I don't want.
Should I use BigDecimal? But its displaying as 10364055.81000000052154064178466796875.
Is there any datatype that displays the values as it is? Also the number may be bigger than the one taken as example.
BTW, will using BigDecimal effect the performance of the application?? I might use this in almost all my DTOs.

You should use BigDecimal - but use the String constructor, e.g.:
new BigDecimal("10364055.81");
If you pass a double to BigDecimal, Java must create that double first - and since doubles cannot represent most decimal fractions accurately, it does create the value as 10364055.81000000052154064178466796875 and then passes it to the BigDecimal constructor. In this case BigDecimal has no way of knowing that you actually meant the rounder version.
Generally speaking, using non-String constructors of BigDecimal should be considered a warning that you're not getting the full benefit of the class.
Edit - based on rereading exactly what you wanted to do, my initial claim is probably too strong. BigDecimal is a good choice when you need to represent decimal values exactly (money handling being the obvious choice, you don't want 5.99 * one million to be 5990016.45 for example.
But if you're not worried about the number being stored internally as a very slightly different value to the decimal literal you entered, and just want to print it out again in the same format, then as others have said, an instance of NumberFormat (in this case, new DecimalFormat("########.##")) will do the trick to output the double nicely, or String.format can do much the same thing.
As for performance - BigDecimals will naturally be slower than using primitives. Typically, though, unless the vast majority of your program involves mathematical manipulations, you're unlikely to actually notice any speed difference. That's not to say you should use BigDecimals all over; but rather, that if you can get a real benefit from their features that would be difficult or impossible to realise with plain doubles, then don't sweat the miniscule performance difference they theoretically introduce.

How a number is displayed is distinct from how the number is stored.
Take a look at DecimalFormat for controlling how you can display your numbers when a double (or float etc.).
Note that choosing BigDecimal over double (or vice versa) has pros/cons, and will depend on your requirements. See here for more info. From the summary:
In summary, if raw performance and
space are the most important factors,
primitive floating-point types are
appropriate. If decimal values need to
be represented exactly, high-precision
computation is needed, or fine control
of rounding is desired, only
BigDecimal has the needed
capabilities.

A double would be enough in order to save this number. If your problem is you don't like the format when printing or putting it into a String, you might use NumberFormat: http://java.sun.com/javase/6/docs/api/java/text/NumberFormat.html

you can use double and display if with System.out.printf().
double d = 100003.81;
System.out.printf("%.10f", d);
.10f - means a double with precision of 10

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.