Is there a Java function for computing the Gini coefficient? - java

I was wondering whether there is a Java function, either built-in to Java, or in an "offical" library such as Apache Commons Math, which computes the Gini coefficient.
From Wikipedia → Gini coefficient:
In economics, the Gini coefficient, sometimes called the Gini index or Gini ratio, is a measure of statistical dispersion intended to represent the income or wealth distribution of a nation's residents, and is the most commonly used measurement of inequality.

I'm not aware of one. But then writing one is pretty trivial!
double gini(List<Double> values) {
double sumOfDifference = values.stream()
.flatMapToDouble(v1 -> values.stream().mapToDouble(v2 -> Math.abs(v1 - v2))).sum();
double mean = values.stream().mapToDouble(v -> v).average().getAsDouble();
return sumOfDifference / (2 * values.size() * values.size() * mean);
}

Related

Math modified pow function for fractional exponents and negative bases

I've noticed that calculators and graphing programs like Desmos or Geogebra or Google (google search x^(1/3)) must have modified version the Math.pow() function that allows them to plot some negative bases and fractional exponents that would otherwise be undefined using the regular Math.pow() function.
I'm trying to recreate this modified pow function so I can plot the missing sections of graphs, for example $x^{\frac{1}{3}}$ where $x<0$
My Attempt
I'm not aware what this modified pow function is called in computer science or math literature, so I can't look up references that would help make a robust and optimized version of it. Instead, I've attempted to make my own version with the help of the Fraction class from Apache math3 library to determine some of the conditional statements like when the numerator and denominator is even or odd for fractional exponents.
My version has a few issues that I'll outline and might be missing some extra conditions that I haven't considered which could lead to errors.
/* Main Method */
public static void main(String[] args) {
double xmin = -3;
double xmax = 3;
double epsilon = 0.011;
/* print out x,y coordinates for plot */
for (double x = xmin; x <= xmax; x += epsilon){
System.out.println(x+","+f(x));
}
}
/* Modified Power Function*/
private static double pow2(double base, double exponent){
boolean negativeBase = base < 0;
/* exponent is an integer and base non-negative */
if (exponent == ((int) exponent) && !negativeBase){
/* use regular pow function */
return Math.pow(base, exponent);
}
Fraction fraction;
try {
fraction = new Fraction(exponent, 1000); /* maxDenominator of 1000 for speed */
} catch (FractionConversionException e){
return Double.NaN;
}
/* updated: irrational exponent */
if (exponent != fraction.doubleValue()){
/* handles irrational exponents like π which cannot be reduced to fractions.
* depends heavily on the maxDenominator value set above.
* With a maxDenominator of 1000, fractions like 1/33333 who's denominator has greater then 4 digits
* will be considered as irrational. To avoid this, increase maxDenominator to 10000 but will slow down performance.
* That's the trade off with this part of the algorithm.
* Also this condition helps clear up a lot the mess on a plot.
* If the plot is centered at exactly origin (0,0) the messy artifacts may appear, but by offsetting
* the view of the plot slightly from the origin will make it disappear
* or maybe it has more to do with the stepsize epsilon (0.01 (clean number) vs 0.011 (irregular number))
* */
return Math.pow(base, exponent);
}
if (fraction.getDenominator() % 2 == 0){
/* if even denominator */
if (negativeBase){
return Double.NaN;
}
} else {
/* if odd denominator, allows for negative bases */
if (negativeBase){
if (fraction.getNumerator() % 2 == 0){
/* if even numerator
* (-base)^(2/3) is the same as ((-base)^2)^(1/3)
* any negative base squared is positive */
return Math.pow(-base, exponent);
}
/* return negative answer, make base and exponent positive */
return -Math.pow(-base, exponent);
}
}
return Math.pow(base, exponent);
}
/* Math function */
private static double f(double x){
/* example f(x) = x^(1/x) */
return pow2(x, (double) 1/x);
}
Issue #1
For both issues, I'll be using the math function $f(x) = x^{\frac{1}{x}}$ as an example for a plot that demonstrates both issues - the first being the FractionConversionException that is caused by having a large value for the exponent. This error will occur if the value of epsilon in my code is changed to 0.1, but seems to avoid the error when the stepsize epsilon is 0.11. I'm not sure how to resolve it properly, but looking within the Fraction class where it throws the FractionConversionException, it uses a conditional statement that I could copy over to my pow2() function and get it to return NaN with the code below. But I'm not sure if that the correct thing to do!
long overflow = Integer.MAX_VALUE;
if (FastMath.abs(exponent) > overflow) {
return Double.NaN;
}
EDIT: Adding a try/catch statement around the instantiation of the Fraction class and returning NaN in the catch clause seems to be a good workaround for now. Instead of the above code.
Issue #2
Plotting the math function $f(x) = x^{\frac{1}{x}}$ produces a messy section on the left where $x<0$, see image below
as opposed to what it should look like
https://www.google.com/search?q=x^(1%2Fx)
I don't how to get rid of this mess, so that where $x<0$ it should be undefined (NaN), while allowing the pow2() function to still plot functions like $x^{\frac{1}{3}}$,$x^{\frac{2}{3}}$,$x^{x}$, etc...
I'm also not sure what to set the maxDenominator when instantiating the Fraction object for good performance while not affecting the results of the plot. Maybe there's a faster decimal to fraction conversion algorithm out there, although I'd imagine Apache math3 is probably very optimized.
Edit:
Issue 3
I forgot to consider irrational exponents because I was too consumed with neat fractions. My algorithm fails for irrational exponents like π and plots two versions of the graph together. Because of the way the Fraction class rounds the exponent, some of the denominators are considered even and odd. Maybe having a condition that if the irrational exponent doesn't equal the fraction to instead return NaN. Just quickly tested this condition, had to add a negativeBase condition which flips the graph the right way. Need to do further testing, but might be an idea.
Edit2: After testing, it should actually return the regular pow() function instead of NaN to handle to conditions for irrational exponents (see updated code for the modified power function) and also this approach surprisingly manages to get rid of most of the mess highlighted in Issue #2 as I believe there are more irrational numbers in an interval than rational number which are being discounted by the algorithm making it less dense and harder to connect two points into a line to appear on the plot.
if (exponent != fraction.doubleValue() && negativeBase){
return Double.NaN;
}
Extra Question
Is it accurate to represent/plot this data of a function like most modern graphing programs (mentioned above) seem to do or is it really misleading considering that the regular power function considers this extra data for negative bases or exponents as undefined? And what these regions or parts of the plot called in mathematics (it's technical term)?
Also open to any other approach or algorithm

Recursive Power Function gives weird answer

Hello better programmers than me. Not a huge deal, but I am curious about this function, and more importantly the result it gives sometimes. So I defined a recursive power function for a homework assignment that is to include negative exponents. For positive values and 0, it works fine, but when I enter some negative values, the answer is really strange. Here is the function:
public static double Power(int base, int exp){
if (exp == 0)
return 1.0;
else if(exp >=1)
return base * Power(base, exp - 1);
else
return (1.0/base) * Power(base, exp + 1);
}
So for a call Power(5, -1) the function returns 0.2, like it should. But for say Power(5, -2) the function returns 0.04000000000000001 instead of just 0.04.
Again, this isn't a huge deal since it's for homework and not "real life", but just curious as to why this happened. I assume it has something to do with how computer memory or a double value is stored but really have no idea. Thanks all!
PS, this is coded in Java using Netbeans if that makes a difference.
Floating point rounding errors can be reduced by careful organization of your arithmetic. In general, you want to minimize the number of rounding operations, and the number of calculations done on rounded results.
I made a small change to your function:
public static double Power(int base, int exp) {
if (exp == 0)
return 1.0;
else if (exp >= 1)
return base * Power(base, exp - 1);
else
return (1.0 / Power(base, -exp));
}
For your test case, Power(5, -2), this does only one rounded calculation, the division at the top of the recursion. It gets the closest double to 1/25.0, which prints as 0.04.
It's a 1990's thing
This will likely be an ignored or controversial answer but I think it needs to be said.
Others have focused on the message that "floating point" calculations (eg one involving one or more numbers that are "doubles") do approximate math.
My focus in this answer is on the message that, even though it is this way in ordinary Java code, and indeed ordinary code in most programming languages, computations with numbers like 0.1 don't have to be approximate.
A few languages treat numbers like 0.1 as rational numbers, a ratio between two integers (numerator over denominator, in this case 1 over 10 or one tenth) just like they are in school math. Computations involving nothing but integers and rationals is 100% accurate (ignoring integer overflow and/or OOM).
Unfortunately, rational computations can get pathologically slow if the denominator gets too large.
Some languages take a compromise position. They treat some rationals as rationals (so with 100% accuracy) and only give up on 100% accuracy, switching to floats, when rational calculations would be pathologically slow.
For example, here's some code in a relatively new and forward looking programming language:
sub Power(\base, \exp) {
given exp {
when 0 { 1.0 }
when * >= 1 { base * Power(base, exp - 1) }
default { 1.0/base * Power(base, exp + 1) }
}
}
This duplicates your code in this other language.
Now use this function to get results for a list of exponents:
for 1000,20,2,1,0,-1,-2,-20,-1000 -> \exp { say Power 5, exp }
Running this code in glot.io displays:
9332636185032188789900895447238171696170914463717080246217143397959
6691097577563445444032709788110235959498993032424262421548752135403
2394841520817203930756234410666138325150273995075985901831511100490
7962651131182405125147959337908051782711254151038106983788544264811
1946981422866095922201766291044279845616944888714746652800632836845
2647429261829862165202793195289493607117850663668741065439805530718
1363205998448260419541012132296298695021945146099042146086683612447
9295203482686461765792691604742006593638904173789582211836507804555
6628444273925387517127854796781556346403714877681766899855392069265
4394240087119736747017498626266907472967625358039293762338339810469
27874558605253696441650390625
95367431640625
25
5
1
0.2
0.04
0.000000000000010
0
The above results are 100% accurate -- until the last exponent, -1000. We can see where the language gives up on 100% accuracy if we check the types of the results (using WHAT):
for 1000,20,2,1,0,-1,-2,-20,-1000 -> \exp { say WHAT Power 5, exp }
displays:
(Rat)
(Rat)
(Rat)
(Rat)
(Rat)
(Rat)
(Rat)
(Rat)
(Num)
Converting Rats (the default rational type) into FatRats (the arbitrary precision rational type) avoids inaccuracy even with pathologically large denominators:
sub Power(\base, \exp) {
given exp {
when 0 { 1.0.FatRat }
when * >= 1 { base * Power(base, exp - 1) }
default { 1.0.FatRat/base * Power(base, exp + 1) }
}
}
This yields the same display as our original code except for the last calculation which comes out as:
0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000011
I don't know if that's accurate, but, aiui, it's supposed to be.

Is there a way to make BigDecimal faster than I have here?

I'm working on some financial analysis software that will need to process large (ish) volumes of data. I'd like to use BigDecimal for the accuracy (some pricing info goes to four or five digits to the right of the decimal) but I was concerned about speed.
I wrote the following test app and it appears that BigDecimal can be 90 to 100 times slower than Doubles. I knew there would be a delta, but that's more than I was expecting. Here's a typical output after many trials.
BigDecimal took 17944 ms
Double took 181 ms
Am I missing something?
Here is the code. I tried to make it representative of real world. I created a constant where I could (pi) but also did some inline math of numbers that would vary from data row to data row - such as pi * BigDecimal(i) + BigDecimal(1). My point being that avoiding constructors can't be the only answer.
Fortunately, it appears Double has enough precision anyway since numbers will be typically in the format 00000.00000. Any hidden gotchas I should know about, though? Do people use Double for financial analysis software?
import java.math.BigDecimal
object Stopwatch {
inline fun elapse(f: () -> Unit):Long {
val start = System.currentTimeMillis()
f()
return System.currentTimeMillis() - start
}
}
fun tryBigDecimal() {
val arr: MutableList<BigDecimal> = arrayListOf()
for (i in 1..10000000) {
arr.add(BigDecimal(i))
}
val pi = BigDecimal(3.14159)
for (i in 0..arr.size - 1) {
arr[i] = arr[i] * pi / (pi * BigDecimal(i) + BigDecimal(1))
}
//arr.forEachIndexed { i, bigDecimal -> println("$i, ${bigDecimal.toString()}")}
}
fun tryDouble() {
val arr: MutableList<Double> = arrayListOf()
for (i in 1..10000000) {
arr.add(i.toDouble())
}
val pi = 3.14159
for (i in 0..arr.size - 1) {
arr[i] = arr[i] * pi / (pi * i + 1)
}
//arr.forEachIndexed { i, bigDecimal -> println("$i, ${bigDecimal.toString()}")}
}
fun main(args: Array<String>) {
val bigdecimalTime = Stopwatch.elapse(::tryBigDecimal)
println("BigDecimal took $bigdecimalTime ms")
val doubleTime = Stopwatch.elapse(::tryDouble)
println("Double took $doubleTime ms")
}
Yes, BigDecimal is appropriate for money. Or any other situation where you need accuracy rather than speed.
Floating-point
The float, Float, double, and Double types all use floating-point technology.
The purpose of floating-point is to trade away accuracy for speed of execution. So you often see extraneous incorrect digits at the end of the decimal fraction. This is acceptable for gaming, 3D visualizations, and many scientific applications. Computers commonly have specialized hardware to accelerate floating point calculations. This is possible because the IEEE has concretely standardized floating point behavior.
Floating-point is not acceptable for financial transactions. Nor is floating point acceptable in any other situation that expects correct fractions.
BigDecimal
The two purposes of BigDecimal are:
Handle arbitrarily large/small number.
Not use floating point technology.
So, what does your app need? Slow but accurate? Or, fast but slightly inaccurate? Those are your choices. Computers are not magic, computers are not infinitely fast nor infinitely accurate. Programming is like engineering in that it is all about choosing between trade-offs according to the needs of your particular application.
BigDecimal is one of the biggest sleeper features in Java. Brilliant work by IBM and others. I don't know if any other development platform has such an excellent facility for accurately handling decimal numbers. See some JavaOne presentations from years ago if you want to appreciate the technical issues.
Do not initialize a BigDecimal object by passing a float or double:
new BigDecimal( 1234.4321 ) // BAD - Do not do this.
That argument creates a float value which introduces the inaccuracies of floating point technology. Use the other constructors.
new BigDecimal( "1234.4321" ) // Good
You can try Moneta, the JSR 354 reference implementation (JavaMoney RI). It has a FastMoney implementation:
FastMoney represents numeric representation that was optimized for speed. It represents a monetary amount only as a integral number of type long, hereby using a number scale of 100'000 (10^5).
e.g.
operator fun MonetaryAmount.times(multiplicand: Double): MonetaryAmount {
return multiply(multiplicand)
}
operator fun MonetaryAmount.div(divisor: Double): MonetaryAmount {
return divide(divisor)
}
fun tryFastMoney() {
val currency = Monetary.getCurrency("USD")
val arr: MutableList<MonetaryAmount> = arrayListOf()
for (i in 1..10000000) {
arr.add(FastMoney.of(i, currency))
}
val pi = 3.14159
for (i in 0..arr.size - 1) {
arr[i] = arr[i] * pi / (pi * i + 1)
}
}
fun main(args: Array<String>) {
val fastMoneyTime = Stopwatch.elapse(::tryFastMoney)
println("FastMoney took $fastMoneyTime ms")
val doubleTime = Stopwatch.elapse(::tryDouble)
println("Double took $doubleTime ms")
}
FastMoney took 7040 ms
Double took 4319 ms
The most common solution for finances is using Int or several Ints:
val pi = 314159 // the point is implicit. To get the real value multiply `pi * 0.00001`
That way you explicitly control everything about the numbers (e.i. the remainders after a division).
You may use Long, but it is not atomic, and thus it is not concurrently safe. Which means that you have to synchronise on any shared Long you have.
A rule of thumb is to never ever use Floating Point Arithmetics (e.i. Double or Float) for finances, because, well, its point floats, thus guaranteeing absolutely nothing when the numbers are big.

Using cubic formula to calculate roots of cubic equation not working

I'm trying to make a program that outputs the roots of a given cubic equation. I therefore decided to make a version using the cubic formula (http://www.math.vanderbilt.edu/~schectex/courses/cubic/). This formula should be able to output the result of one of the roots.
However, it doesnt seem to work and I'm not sure if it the code or the idea that is flawed. Here the coefficients 1, -6, 11 and -6 should create an output of either 1, 2 or 3. Instead NaN is outputted. The same has applied to other coefficients I have tried to use. Thanks for all your help!
public class CubicFormula {
public static void main(String[] args) {
System.out.println(new CubicFormula().findRoots(1.0, -6.0, 11.0, -6.0));
}
public double findRoots(double a, double b, double c, double d) {
double p = -(b)/(3*a);
double q = Math.pow(p, 3) + (b*c - 3*a*d)/(6*Math.pow(a, 2));
double r = c/(3*a);
return Math.cbrt(q + Math.sqrt(Math.pow(q, 2.0) + Math.pow((r - Math.pow(p, 2.0)), 3)))
+ Math.cbrt(q - Math.sqrt(Math.pow(q, 2.0) + Math.pow((r - Math.pow(p, 2.0)), 3))) + p;
}
}
From the very link you have mentioned
One reason is that we're trying to avoid teaching them about complex numbers. Complex numbers (i.e., treating points on the plane as numbers) are a more advanced topic, best left for a more advanced course. But then the only numbers we're allowed to use in calculus are real numbers (i.e., the points on the line). That imposes some restrictions on us --- for instance, we can't take the square root of a negative number. Now, Cardan's formula has the drawback that it may bring such square roots into play in intermediate steps of computation, even when those numbers do not appear in the problem or its answer.
This part
Math.sqrt(Math.pow(q, 2.0) + Math.pow((r - Math.pow(p, 2.0)), 3))
will end up being sqrt of negative number, which is imaginary, but in java
doubles world ends up being NaN.

How Were These Coefficients in a Polynomial Approximation for Sine Determined?

Background: I'm writing some geometry software in Java. I need the precision offered by Java's BigDecimal class. Since BigDecimal doesn't have support for trig functions, I thought I'd take a look at how Java implements the standard Math library methods and write my own version with BigDecimal support.
Reading this JavaDoc, I learned that Java uses algorithms "from the well-known network library netlib as the package "Freely Distributable Math Library," fdlibm. These algorithms, which are written in the C programming language, are then to be understood as executed with all floating-point operations following the rules of Java floating-point arithmetic."
My Question: I looked up fblibm's sin function, k_sin.c, and it looks like they use a Taylor series of order 13 to approximate sine (edit - njuffa commented that fdlibm uses a minimax polynomial approximation). The code defines the coefficients of the polynomial as S1 through S6. I decided to check the values of these coefficients, and found that S6 is only correct to one significant digit! I would expect it to be 1/(13!), which Windows Calculator and Google Calc tell me is 1.6059044...e-10, not 1.58969099521155010221e-10 (which is the value for S6 in the code). Even S5 differs in the fifth digit from 1/(11!). Can someone explain this discrepancy? Specifically, how are those coefficients (S1 through S6) determined?
/* #(#)k_sin.c 1.3 95/01/18 */
/*
* ====================================================
* Copyright (C) 1993 by Sun Microsystems, Inc. All rights reserved.
*
* Developed at SunSoft, a Sun Microsystems, Inc. business.
* Permission to use, copy, modify, and distribute this
* software is freely granted, provided that this notice
* is preserved.
* ====================================================
*/
/* __kernel_sin( x, y, iy)
* kernel sin function on [-pi/4, pi/4], pi/4 ~ 0.7854
* Input x is assumed to be bounded by ~pi/4 in magnitude.
* Input y is the tail of x.
* Input iy indicates whether y is 0. (if iy=0, y assume to be 0).
*
* Algorithm
* 1. Since sin(-x) = -sin(x), we need only to consider positive x.
* 2. if x < 2^-27 (hx<0x3e400000 0), return x with inexact if x!=0.
* 3. sin(x) is approximated by a polynomial of degree 13 on
* [0,pi/4]
* 3 13
* sin(x) ~ x + S1*x + ... + S6*x
* where
*
* |sin(x) 2 4 6 8 10 12 | -58
* |----- - (1+S1*x +S2*x +S3*x +S4*x +S5*x +S6*x )| <= 2
* | x |
*
* 4. sin(x+y) = sin(x) + sin'(x')*y
* ~ sin(x) + (1-x*x/2)*y
* For better accuracy, let
* 3 2 2 2 2
* r = x *(S2+x *(S3+x *(S4+x *(S5+x *S6))))
* then 3 2
* sin(x) = x + (S1*x + (x *(r-y/2)+y))
*/
#include "fdlibm.h"
#ifdef __STDC__
static const double
#else
static double
#endif
half = 5.00000000000000000000e-01, /* 0x3FE00000, 0x00000000 */
S1 = -1.66666666666666324348e-01, /* 0xBFC55555, 0x55555549 */
S2 = 8.33333333332248946124e-03, /* 0x3F811111, 0x1110F8A6 */
S3 = -1.98412698298579493134e-04, /* 0xBF2A01A0, 0x19C161D5 */
S4 = 2.75573137070700676789e-06, /* 0x3EC71DE3, 0x57B1FE7D */
S5 = -2.50507602534068634195e-08, /* 0xBE5AE5E6, 0x8A2B9CEB */
S6 = 1.58969099521155010221e-10; /* 0x3DE5D93A, 0x5ACFD57C */
#ifdef __STDC__
double __kernel_sin(double x, double y, int iy)
#else
double __kernel_sin(x, y, iy)
double x,y; int iy; /* iy=0 if y is zero */
#endif
{
double z,r,v;
int ix;
ix = __HI(x)&0x7fffffff; /* high word of x */
if(ix<0x3e400000) /* |x| < 2**-27 */
{if((int)x==0) return x;} /* generate inexact */
z = x*x;
v = z*x;
r = S2+z*(S3+z*(S4+z*(S5+z*S6)));
if(iy==0) return x+v*(S1+z*r);
else return x-((z*(half*y-v*r)-y)-v*S1);
}
We can use trig identities to get everything down to 0≤x≤π/4, and then need a way to approximate sin x on that interval. On 0≤x≤2-27, we can just stick with sin x≈x (which the Taylor polynomial would also give, within the tolerance of a double).
The reason for not using a Taylor polynomial is in step 3 of the algorithm's comment. The Taylor polynomial gives (provable) accuracy near zero at the expense of less accuracy as you get away from zero. By the time you get to π/4, the 13th order Taylor polynomial (divided by x) differs from (sin x)/x by 3e-14. This is far worse than fblibm’s error of 2-58. In order to get that accurate with a Taylor polynomial, you’d need to go until (π/4)n-1/n! < 2-58, which takes another 2 or 3 terms.
So why does fblibm settle for an accuracy of 2-58? Because that’s past the tolerance of a double (which only has 52 bits in its mantissa).
In your case though, you’re wanting arbitrarily many bits of sin x. To use fblibm’s approach, you’d need to recalculate the coefficients whenever your desired accuracy changes. Your best approach seems to be to stick with the Taylor polynomial at 0, since it’s very easily computable, and take terms until (π/4)n-1/n! meets your desired accuracy.
njuffa had a useful idea of using identities to further restrict your domain. For example, sin(x) = 3*sin(x/3) - 4*sin^3(x/3). Using this would let you restrict your domain to 0≤x≤π/12. And you could use it twice to restrict your domain to 0≤x≤π/36. This would make it so that your Taylor expansion would have your desired accuracy much more quickly. And instead of trying to get an arbitrarily accurate value of π for (π/4)n-1/n!, I’d recommend rounding π up to 4 and going until 1/n! meets your desired accuracy (or 3-n/n! or 9-n/n! if you’ve used the trig identity once or twice).

Categories

Resources