new BigInteger(String) performance / complexity - java

I'm wondering about the performance/complexity of constructing BigInteger objects with the new BigInteger(String) constructor.
Consider the following method:
public static void testBigIntegerConstruction()
{
for (int exp = 1; exp < 10; exp++)
{
StringBuffer bigNumber = new StringBuffer((int) Math.pow(10.0, exp));
for (int i = 0; i < Math.pow(10.0, exp - 1); i++)
{
bigNumber.append("1234567890");
}
String val = bigNumber.toString();
long time = System.currentTimeMillis();
BigInteger bigOne = new BigInteger(val);
System.out.println("time for constructing a 10^" + exp
+ " digits BigInteger : " + ((System.currentTimeMillis() - time))
+ " ms");
}
}
This method creates BigInteger objects of Strings with 10^x digits, where x=1 at the beginning, and it's increased with every iteration. It measures and outputs the time required for constructing the corresponding BigInteger object.
On my machine (Intel Core i5 660, JDK 6 Update 25 32 bit) the output is:
time for constructing a 10^1 digits BigInteger : 0 ms
time for constructing a 10^2 digits BigInteger : 0 ms
time for constructing a 10^3 digits BigInteger : 0 ms
time for constructing a 10^4 digits BigInteger : 16 ms
time for constructing a 10^5 digits BigInteger : 656 ms
time for constructing a 10^6 digits BigInteger : 59936 ms
time for constructing a 10^7 digits BigInteger : 6227975 ms
While ignoring the lines up to 10^5 (because of possible distortions introduced by (processor) caching effects, JIT-compilation etc), we can clearly see a complexity of O(n^2) here.
Keeping in mind that every operation on a BigInteger creates a new one due to immutability, this is a major performance penality for huge numbers.
Questions:
Did I miss something?
Why is this the case?
Is this fixed in more recent JDKs?
Are there any alternatives?
UPDATE:
I did further measurements and I can confirm the statement from some of the answers: It seems that BigInteger is optimized for subsequent numerical operations with the expense of higher construction costs for huge numbers which seems reasonable for me.

Simplifying from the source somewhat, it's the case because in the "traditional" String parsing loop
for each digit y from left to right:
x = 10 * x + y
you have the issue that 10 * x takes time linear in the length of x, unavoidably, and that length grows by more-or-less a constant factor for each digit, also unavoidably.
(The actual implementation is somewhat smarter than this -- it tries to parse an int's worth of binary digits at a time, and so the actual multiplier in the loop is more likely 1 or 2 billion -- but yeah, it's still quadratic overall.)
That said, a number with 10^6 digits is at least a googol, and that's bigger than any number I've heard of being used even for cryptographic purposes. You're parsing a string that takes two megabytes of memory. Yes, it'll take a while, but I suspect the JDK authors didn't see the point of optimizing for such a rare use case.

The O(n^2) effort is caused by the decimal to binary conversion if the BigInteger is specified as decimal digits.
Also, 10^7 digits is a really huge number. For typical cryptographic algorithms like RSA you would deal with 10^3 to 10^4 digits. Most of the BigInteger operations are not optimized for such a large number of digits.

You're actually measuring the time it takes to parse a string and create the BigInteger. Numeric operations involving BigIntegers would be a lot more efficient than this.

Related

The results differ with parallel summation in Java

I made class Sum extends RecursiveTask. The task is to calculate sum of 1 / a[i].
public class Sum extends RecursiveAction {
int[] items;
double result;
int min = 100000;
int from, to;
Sum(int[] items, int from, int to) {
this.items = items;
this.from = from;
this.to = to;
}
#Override
protected void compute() {
if (to - from <= min) {
for (int i = from; i < to; i++) {
result += 1d / items[i];
}
} else {
var mid = (from + to) / 2;
var left = new Sum(items, from, mid);
var right = new Sum(items, mid, to);
invokeAll(left, right);
result += left.result + right.result;
}
}
}
Results:
Single: 1.3180710500108106E8
Total time: 0.612
Parallel: 1.3180710501986596E8
Total time: 0.18
The numbers are very close and differ by a small accuracy. What could this be related to? I noticed that if you remove 1 / a[i], it will be calculated correctly
I'm guessing that you might be trying to sum a list of millions of numbers. You wrote a multi-threaded, divide-and-conquer routine that stops dividing when the list gets to be less than 100,000 long (int min=100000;) so if it's worth splitting the list in to chunks of that size, there must be at least a few of those chunks, right?
So, here's the issue: Let's say you want to add up a million numbers that are around the same order of magnitude. Maybe they're all readings from the same sensor. Let's say that the arithmetic mean of the whole list is X. If were to simply run down that list, from beginning to end, accumulating numbers,...
...The expected value of the first sum is X+X,
...of the next sum, X+2X
...
...of the last sum, X+999999X
OK, but 999999X is six orders of magnitude greater than X. In binary floating point, the exponent of 999999X is going to be greater than the exponent of X by about 20. That is to say, the binary value of 999999X is approximately the value of X shifted left by 20 bits.
In order to do the addition both numbers must have the same exponent, and the way that is accomplished is to denormalize X. If you shift the mantissa of X to the right by 20 bits, and then you add 20 to its exponent, it should, in theory, still represent the same number. Only problem is, you've just shifted away the 20 least-significant bits.
If you're using double, the original X had 54 bits of precision, but the de-normalized X that you can use in the addition only has 34 bits. If you're using float,* the original X had 23 bits, and the denormalized X has only three bits of precision.
Your goal in writing your "divide-and-conquer" algorithm was to break the problem into tasks that could be given to different threads. But a side-effect was, you also got a more accurate answer. More accurate because, for each chunk, the last step is to compute X + 99999X. The exponent mismatch there is only 16 or 17 bits instead of 19 or 20 bits. You threw away three fewer bits of precision.
To get the best possible precision, start by sorting the list—smallest numbers first. (NOTE: smallest means, closest to zero—least absolute value.) Then, remove the first two numbers from the list, add them, and insert the sum back into the list, in the correct place to keep the list sorted. (those inserts go a lot faster if you use a linked list.) Finally, repeat those steps until the list contains only one number, and that's the most accurate sum you can get without using a wider data type.
Wider data type!! That's what you really want. If you can accumulate your sum in an IEEE Quad float, you've got 112 bits of precision to work with. Adding up a million numbers? Lost 20 bits of precision? No problem! The 92 bits you've got at the end still is more than the 54 bits in the doubles that you started with. You could literally add lists of a trillion numbers before you started to lose precision as compared to double floats.
Using a wider data type if you've got one, will give you far better performance than the crazy sorted-list algorithm that I gave you above.
* Don't do math on float values. The only use for float is to save space in huge binary files and huge arrays. If you've got an array of float and you want to do math on them, convert to double, do the math, and convert back to float when you're done.

Does optimization of the initial guess make the Babylonian method of finding square roots fast?

The Babylonian aka Heron's method seems to be one of the faster algorithms for finding the square root for a number n. How fast it converges is dependent on how far of your initial guess is.
Now as number n increases its root x as its percentage decreases.
root(10):10 - 31%
10:100 - 10%
root(100) : 100 - 3%
root(1000) : 1000 - 1%
So basically for each digit in the number divide by around 3. Then use that as your intial guess. Such as -
public static double root(double n ) {
//If number is 0 or negative return the number
if(n<=0)
return n ;
//call to a method to find number of digits
int num = numDigits(n) ;
double guess = n ;
//Divide by 0.3 for every digit from second digit onwards
for (int i = 0 ; i < num-1 ; ++i )
guess = guess * 0.3;
//Repeat until it converges to within margin of error
while (!(n-(guess*guess) <= 0.000001 && n-(guess*guess) >= 0 )) {
double divide = n/guess ;
guess = Math.abs(0.5*(divide+guess)) ;
}
return Math.abs(guess) ;
}
Does this help and optimize the algorithm. And is this O(n) ?
Yes. What works even better is to exploit the floating point representation, by dividing the binary exponent approximately by two, because operating on the floating-point bits is very fast. See Optimized low-accuracy approximation to `rootn(x, n)`.
My belief is that the complexity of an algorithm is independent on the input provided. (the complexity is a general characteristic of that algorithm, we cannot say that algorithm x has complexity O1 for input I1 and complexity O2 for input I2). Thus, no matter what initial value you provide, it should not improve complexity. It may improve the number of iterations for that particular case, but that's a different thing. Reducing the number of iterations by half still means the same complexity. Keep in mind that n, 2*n, n/3 all fit into the O(n) class .
Now, with regard to the actual complexity, i read on wikipedia (https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Babylonian_method) that
This is a quadratically convergent algorithm, which means that the number of correct digits of the approximation roughly doubles with each iteration.
This means you need as many iterations as the number of precision decimals that you expect. Which is constant. If you need 10 exact decimals, 10 is a constant, being totally independent of n.
But on wikipedia's example, they chose from the very begining a candidate which had the same order of magnitude as the correct answer (600 compared to 354). However, if your initial guess is way too wrong (by orders of magnitude) you will need some extra iterations to cut down/add to the necessary digits. Which will add complexity. Suppose correct answer is 10000 , while your initial guess is 10. The difference is 4 orders of magnitude, and i think in this case, the complexity needed to reach the correct magnitude is proportional to the difference between the number of digits of your guess and the number of digits of the correct answer. Since number of digits is approximativelly log(n), in this case the extra complexity is log(corect_answer) -log(initial_guess), taken as absolute value.
To avoid this, pick a number that has the right number of digits, which is generally half the number of digits of your initial number. My best choice would be picking the first half of the number as a candidate (from 123456, keep 123, from 1234567, either 123 or 1234). In java, you could use byte operations to keep the first half of a number/string/whatever is kept in memory. Thus you will need no iteration, just a byte operation with constant complexity.
For n ≥ 4, sqrt(n) = 2 sqrt(n / 4). For n < 1, sqrt(n) = 1/2 sqrt(n × 4). So you can always multiply or divide by 4 to normalize n on the range [1, 4).
Once you do that, take sqrt(4) = 2 as the starting point for Heron's algorithm, since that is the geometric mean and will yield the greatest possible improvement per iteration, and unroll the loop to perform the needed number of iterations for the desired accuracy.
Finally, multiply or divide by all the factors of 2 that you removed at the beginning. Note that multiplying and dividing by 2 or 4 is easy and fast for binary computers.
I discuss this algorithm at my blog.

Biggest possible rounding error when computing floating-point numbers

I'm developing a time critical algorithm in Java and therefore am not using BigDecimal. To handle the rounding errors, I set an upper error bound instead, below which different floating point numbers are considered to be exactly the same. Now the problem is what should that bound be? Or in other words, what's the biggest possible rounding error that can occur, when performing computational operations with floating-point numbers (floating-point addition, subtraction, multiplication and division)?
With an experiment I've done, it seems that a bound of 1e-11 is enough.
PS: This problem is language independent.
EDIT: I'm using double data type. The numbers are generated with Random's nextDouble() method.
EDIT 2: It seems I need to calculate the error based on how the floating-point numbers I'm using are generated. The nextDouble() method looks like this:
public double nextDouble() {
return (((long)(next(26)) << 27) + next(27))
/ (double)(1L << 53); }
Based on the constants in this method, I should be able calculate the the biggest possible error that can occur for floating-point number generated with this method specifically (its machine epsilon?). Would be glad if someone could post the calculation .
The worst case rounding error on a single simple operation is half the gap between the pair of doubles that bracket the real number result of the operation. Results from Random's nextDouble method are "from the range 0.0d (inclusive) to 1.0d (exclusive)". For those numbers, the largest gap is about 1e-16 and the worst case rounding error is about 5e-17.
Here is a program that prints the gap for some sample numbers, including the largest result of Random's nextDouble:
public class Test {
public static void main(String[] args) {
System.out.println("Max random result gap: "
+ Math.ulp(Math.nextAfter(1.0, Double.NEGATIVE_INFINITY)));
System.out.println("1e6 gap: "
+ Math.ulp(1e6));
System.out.println("1e30 gap: "
+ Math.ulp(1e30));
}
}
Output:
Max random result gap: 1.1102230246251565E-16
1e6 gap: 1.1641532182693481E-10
1e30 gap: 1.40737488355328E14
Depending on the calculation you are doing, errors can accumulate across multiple operations, giving bigger total rounding error than you would predict from this simplistic single-operation approach. As Mark Dickinson said in a comment, "Numerical analysis is a bit more complicated than that."
This depends on:
Your algorithm
the magnitude of involved numbers
For example, consider the function f(x) = a * ( b - ( c+ d))
No big deal, or is it?
It turns out it is when d << c, b = c and a whatever, but let's just say it's big.
Let's say:
a = 10e200
b = c = 5
d = 10e-90
This is totally made up, but you get the point. The point is, the difference of magnitude between c and d mean that
c + d = c (small rounding error because d << c)
b - (c + d) = 0 (should be 10e-90)
a * (b - (c + d)) = 0 (where it really should be 10e110)
Long story short, some operations (notably subtractions) (can) kill you. Also, it is not so much the generating function that you need to look at, it is the operations that you do with the numbers (your algorithm).

What variable type can I use to hold huge numbers (30+ digits) in java?

Is there a really large variable type I can use in Java to store huge numbers (up to around forty digits)?
long's maximum value is 9223372036854775807, which is 19 digits -- not nearly large enough.
I'm trying to create a calculator that can handle large numbers, because most nowadays can only hold an insufficient 10 digits or so, and I want want accurate calculations with numbers of a much larger magnitude
EDIT
Thanks for the answers. I can use BigInteger for big integers, the only limit being the computer's memory (should be sufficient). For decimals, I'll use float ^e, as #WebDaldo suggested, or BigDecimal (similar to BigInteger), as #kocko suggested.
You can use BigInteger class.
BigInteger bi1 = new BigInteger("637824629384623845238423545642384");
BigInteger bi2 = new BigInteger("3039768898793547264523745379249934");
BigInteger bigSum = bi1.add(bi2);
BigInteger bigProduct = bi1.multiply(bi2);
System.out.println("Sum : " + bigSum);
System.out.println("Product : " + bigProduct);
Output:
Sum : 3677593528178171109762168924892318
Product : 1938839471287900434078965247064711159607977007048190357000119602656
I should mention BigDecimal, which is excellent for amount calculations compare to double.
BigDecimal bd = new BigDecimal("123234545.4767");
BigDecimal displayVal = bd.setScale(2, RoundingMode.HALF_EVEN);
NumberFormat usdFormat = NumberFormat.getCurrencyInstance(Locale.US);
System.out.println(usdFormat.format(displayVal.doubleValue()));
Output:
$123,234,545.48
You can try using the BigInteger class for operations with really huge integer numbers.
For operations with floating numbers, Java provides the BigDecimal class, which can be useful, as well.
For calculations with exponents, like you would use in a calculator, you should use BigDecimal. The problem with BigInteger is that it only handles integers (no fractional numbers) and that for really big numbers like 10^100 it stores all the zeros, using a lot of memory, instead of using a format based on scientific notation.
You could alternatively use the floating point number type double, which gives you a large range of values, low memory usage and fast operations. But because of rounding issues and limited precision (around 16 decimal digits), I wouldn't recommend using it unless you really know what you're doing.
You can use float ^e
so you could have
0.55342663552772737682136182736127836782163 * 10^e
Calculators are mostly use that, too.
This is for all bigger numbers above 15 since using int blows it. You may want to find the factorial of 50 or 100 0r 500.
// Recursive version of the Fat factorial for bigger numbers ex: Factorial of 500
BigInteger fatFactorial(int b) {
if (BigInteger.ONE.equals(BigInteger.valueOf(b))
|| BigInteger.ZERO.equals(BigInteger.valueOf(b))) {
return BigInteger.ONE;
} else {
return BigInteger.valueOf(b).multiply(fatFactorial(b - 1));
}
}

How to assign the largest n bit unsigned integer to a BigInteger in Java

I have a scenario where I'm working with large integers (e.g. 160 bit), and am trying to create the biggest possible unsigned integer that can be represented with an n bit number at run time. The exact value of n isn't known until the program has begun executing and read the value from a configuration file. So for example, n might be 160, or 128, or 192, etcetera...
Initially what I was thinking was something like:
BigInteger.valueOf((long)Math.pow(2, n));
but then I realized, the conversion to long that takes place sort of defeats the purpose, given that long is not comprised of enough bits in the first place to store the result. Any suggestions?
On the largest n-bit unsigned number
Let's first take a look at what this number is, mathematically.
In an unsigned binary representation, the largest n-bit number would have all bits set to 1. Let's take a look at some examples:
1(2)= 1 =21 - 1
11(2)= 3 =22 - 1
111(2)= 7 =23 - 1
:
1………1(2)=2n -1
   n
Note that this is analogous in decimal too. The largest 3 digit number is:
103- 1 = 1000 - 1 = 999
Thus, a subproblem of finding the largest n-bit unsigned number is computing 2n.
On computing powers of 2
Modern digital computers can compute powers of two efficiently, due to the following pattern:
20= 1(2)
21= 10(2)
22= 100(2)
23= 1000(2)
:
2n= 10………0(2)
       n
That is, 2n is simply a number having its bit n set to 1, and everything else set to 0 (remember that bits are numbered with zero-based indexing).
Solution
Putting the above together, we get this simple solution using BigInteger for our problem:
final int N = 5;
BigInteger twoToN = BigInteger.ZERO.setBit(N);
BigInteger maxNbits = twoToN.subtract(BigInteger.ONE);
System.out.println(maxNbits); // 31
If we were using long instead, then we can write something like this:
// for 64-bit signed long version, N < 64
System.out.println(
(1L << N) - 1
); // 31
There is no "set bit n" operation defined for long, so traditionally bit shifting is used instead. In fact, a BigInteger analog of this shifting technique is also possible:
System.out.println(
BigInteger.ONE.shiftLeft(N).subtract(BigInteger.ONE)
); // 31
See also
Wikipedia/Binary numeral system
Bit Twiddling Hacks
Additional BigInteger tips
BigInteger does have a pow method to compute non-negative power of any arbitrary number. If you're working in a modular ring, there are also modPow and modInverse.
You can individually setBit, flipBit or just testBit. You can get the overall bitCount, perform bitwise and with another BigInteger, and shiftLeft/shiftRight, etc.
As bonus, you can also compute the gcd or check if the number isProbablePrime.
ALWAYS remember that BigInteger, like String, is immutable. You can't invoke a method on an instance, and expect that instance to be modified. Instead, always assign the result returned by the method to your variables.
Just to clarify you want the largest n bit number (ie, the one will all n-bits set). If so, the following will do that for you:
BigInteger largestNBitInteger = BigInteger.ZERO.setBit(n).subtract(BigInteger.ONE);
Which is mathematically equivalent to 2^n - 1. Your question has how you do 2^n which is actually the smallest n+1 bit number. You can of course do that with:
BigInteger smallestNPlusOneBitInteger = BigInteger.ZERO.setBit(n);
I think there is pow method directly in BigInteger. You can use it for your purpose
The quickest way I can think of doing this is by using the constructor for BigInteger that takes a byte[].
BigInteger(byte[] val) constructs the BigInteger Object from an array of bytes. You are, however, dealing with bits, and so creating a byte[] that might consist of {127, 255, 255, 255, 255} for a 39 bit integer representing 2^40 - 1 might be a little tedious.
You could also use the constructor BigInteger(String val, int radix) - which might be readily more apparently what's going on in your code if you don't mind a performance hit for parsing a String. Then you could generate a string like val = "111111111111111111111111111111111111111" and then call BigInteger myInt = new BigInteger(val, 2); - resulting in the same 39 bit integer.
The first option will require some thinking about how to represent your number. That particular constructor expects a two's-compliment, big-endian representation of the number. The second will likely be marginally slower, but much clearer.
EDIT: Corrected numbers. I thought you meant represent 2^n, and didn't correctly read the largest value n bits could store.

Categories

Resources