Java's BigInteger class - java

What algorithms does Java's BigInteger class employ for multiplication and division, and what are their respective time complexities?
What primitive type does Java's BigInteger class use - byte, short, int, etc. - and why?
How does Java's BigInteger class handle the fact that its primitive type is signed? If the answer is it just does it and it's really messy, that's all I need/want to know. What I'm really getting at is does it cheat in the same way some python libraries cheat in that they're not written in python?

I looked at the source code to BigInteger here. Here's what I found.
BigInteger does not "cheat". In Java "cheating" is accomplished through the use of what are known as "native" functions. See java.lang.Math for a rather extensive list of these.
BigInteger uses int to represent its data.
private transient int[] words;
And yes, it is pretty messy. Lot's of bit crunching at the like.

Oracle's java.math.BigInteger class has undergone some extensive improvements from Java 7 to Java 8. See for yourself by examining the source on grepcode.com. It doesn't cheat, it's all pure java.
Internally, it uses a sign-magnitude representation of the integer, using an array of int values to store the magnitude. Recall that a java int is a 32-bit value. All 32-bits are used without regard to the sign. This size is also convenient since the product of two ints fits into a java long.
Beginning in Java 8 the BigInteger class added some advanced algorithms such as Karatsuba and Toom-Cook multiplication to improve the performance for integers of thousands of bits.

It uses int.
Reason: It's the fastest for most platforms.

Related

Java's BigInteger implementation

I'm new here so please excuse my noob mistakes. I'm currently working on a little project of mine that sees me dealing with digits with a length in the forty thousands and beyond.
I'm currently using BigInteger to handle these values, and I need something that performs faster. I've read that BigInteger uses an array of integers in its implementation, and what I need to know is whether BigInteger is using each index in this array to represent each decimal point, as in 1 - 9, or is it using something more efficient.
I ask this because I already have an implementation in mind that uses bit operations, which makes it more efficient, memory and processing wise.
So the final question is - is BigInteger already efficient enough, and should I just rely on that? It would better to know this rather than putting it to the test unnecessarily, which would take a lot of time.
Thank you.
At least with Oracle's Java 8 and OpenJDK 8, it doesn't store one decimal digit per int. It stores full 32-bit portions per 32-bit int in the int[], which can be seen with its source code.
Bit operations are fast for it, since it's a sign-magnitude value and the magnitude is stored packed just as you'd expect, just make sure that you use the relevant BigInteger bitwise methods rather than implementing your own.
If you still need more speed, try something like GMP, though be aware that it uses a LGPL or GPL license. It would also be better to use it outside of Java.

Java - operations on native short always return int, why? and how? [duplicate]

Why does the Java API use int, when short or even byte would be sufficient?
Example: The DAY_OF_WEEK field in class Calendar uses int.
If the difference is too minimal, then why do those datatypes (short, int) exist at all?
Some of the reasons have already been pointed out. For example, the fact that "...(Almost) All operations on byte, short will promote these primitives to int". However, the obvious next question would be: WHY are these types promoted to int?
So to go one level deeper: The answer may simply be related to the Java Virtual Machine Instruction Set. As summarized in the Table in the Java Virtual Machine Specification, all integral arithmetic operations, like adding, dividing and others, are only available for the type int and the type long, and not for the smaller types.
(An aside: The smaller types (byte and short) are basically only intended for arrays. An array like new byte[1000] will take 1000 bytes, and an array like new int[1000] will take 4000 bytes)
Now, of course, one could say that "...the obvious next question would be: WHY are these instructions only offered for int (and long)?".
One reason is mentioned in the JVM Spec mentioned above:
If each typed instruction supported all of the Java Virtual Machine's run-time data types, there would be more instructions than could be represented in a byte
Additionally, the Java Virtual Machine can be considered as an abstraction of a real processor. And introducing dedicated Arithmetic Logic Unit for smaller types would not be worth the effort: It would need additional transistors, but it still could only execute one addition in one clock cycle. The dominant architecture when the JVM was designed was 32bits, just right for a 32bit int. (The operations that involve a 64bit long value are implemented as a special case).
(Note: The last paragraph is a bit oversimplified, considering possible vectorization etc., but should give the basic idea without diving too deep into processor design topics)
EDIT: A short addendum, focussing on the example from the question, but in an more general sense: One could also ask whether it would not be beneficial to store fields using the smaller types. For example, one might think that memory could be saved by storing Calendar.DAY_OF_WEEK as a byte. But here, the Java Class File Format comes into play: All the Fields in a Class File occupy at least one "slot", which has the size of one int (32 bits). (The "wide" fields, double and long, occupy two slots). So explicitly declaring a field as short or byte would not save any memory either.
(Almost) All operations on byte, short will promote them to int, for example, you cannot write:
short x = 1;
short y = 2;
short z = x + y; //error
Arithmetics are easier and straightforward when using int, no need to cast.
In terms of space, it makes a very little difference. byte and short would complicate things, I don't think this micro optimization worth it since we are talking about a fixed amount of variables.
byte is relevant and useful when you program for embedded devices or dealing with files/networks. Also these primitives are limited, what if the calculations might exceed their limits in the future? Try to think about an extension for Calendar class that might evolve bigger numbers.
Also note that in a 64-bit processors, locals will be saved in registers and won't use any resources, so using int, short and other primitives won't make any difference at all. Moreover, many Java implementations align variables* (and objects).
* byte and short occupy the same space as int if they are local variables, class variables or even instance variables. Why? Because in (most) computer systems, variables addresses are aligned, so for example if you use a single byte, you'll actually end up with two bytes - one for the variable itself and another for the padding.
On the other hand, in arrays, byte take 1 byte, short take 2 bytes and int take four bytes, because in arrays only the start and maybe the end of it has to be aligned. This will make a difference in case you want to use, for example, System.arraycopy(), then you'll really note a performance difference.
Because arithmetic operations are easier when using integers compared to shorts. Assume that the constants were indeed modeled by short values. Then you would have to use the API in this manner:
short month = Calendar.JUNE;
month = month + (short) 1; // is july
Notice the explicit casting. Short values are implicitly promoted to int values when they are used in arithmetic operations. (On the operand stack, shorts are even expressed as ints.) This would be quite cumbersome to use which is why int values are often preferred for constants.
Compared to that, the gain in storage efficiency is minimal because there only exists a fixed number of such constants. We are talking about 40 constants. Changing their storage from int to short would safe you 40 * 16 bit = 80 byte. See this answer for further reference.
The design complexity of a virtual machine is a function of how many kinds of operations it can perform. It's easier to having four implementations of an instruction like "multiply"--one each for 32-bit integer, 64-bit integer, 32-bit floating-point, and 64-bit floating-point--than to have, in addition to the above, versions for the smaller numerical types as well. A more interesting design question is why there should be four types, rather than fewer (performing all integer computations with 64-bit integers and/or doing all floating-point computations with 64-bit floating-point values). The reason for using 32-bit integers is that Java was expected to run on many platforms where 32-bit types could be acted upon just as quickly as 16-bit or 8-bit types, but operations on 64-bit types would be noticeably slower. Even on platforms where 16-bit types would be faster to work with, the extra cost of working with 32-bit quantities would be offset by the simplicity afforded by only having 32-bit types.
As for performing floating-point computations on 32-bit values, the advantages are a bit less clear. There are some platforms where a computation like float a=b+c+d; could be performed most quickly by converting all operands to a higher-precision type, adding them, and then converting the result back to a 32-bit floating-point number for storage. There are other platforms where it would be more efficient to perform all computations using 32-bit floating-point values. The creators of Java decided that all platforms should be required to do things the same way, and that they should favor the hardware platforms for which 32-bit floating-point computations are faster than longer ones, even though this severely degraded PC both the speed and precision of floating-point math on a typical PC, as well as on many machines without floating-point units. Note, btw, that depending upon the values of b, c, and d, using higher-precision intermediate computations when computing expressions like the aforementioned float a=b+c+d; will sometimes yield results which are significantly more accurate than would be achieved of all intermediate operands were computed at float precision, but will sometimes yield a value which is a tiny bit less accurate. In any case, Sun decided everything should be done the same way, and they opted for using minimal-precision float values.
Note that the primary advantages of smaller data types become apparent when large numbers of them are stored together in an array; even if there were no advantage to having individual variables of types smaller than 64-bits, it's worthwhile to have arrays which can store smaller values more compactly; having a local variable be a byte rather than an long saves seven bytes; having an array of 1,000,000 numbers hold each number as a byte rather than a long waves 7,000,000 bytes. Since each array type only needs to support a few operations (most notably read one item, store one item, copy a range of items within an array, or copy a range of items from one array to another), the added complexity of having more array types is not as severe as the complexity of having more types of directly-usable discrete numerical values.
If you used the philosophy where integral constants are stored in the smallest type that they fit in, then Java would have a serious problem: whenever programmers write code using integral constants, they have to pay careful attention to their code to check if the type of the constants matter, and if so look up the type in the documentation and/or do whatever type conversions are needed.
So now that we've outlined a serious problem, what benefits could you hope to achieve with that philosophy? I would be unsurprised if the only runtime-observable effect of that change would be what type you get when you look the constant up via reflection. (and, of course, whatever errors are introduced by lazy/unwitting programmers not correctly accounting for the types of the constants)
Weighing the pros and the cons is very easy: it's a bad philosophy.
Actually, there'd be a small advantage. If you have a
class MyTimeAndDayOfWeek {
byte dayOfWeek;
byte hour;
byte minute;
byte second;
}
then on a typical JVM it needs as much space as a class containing a single int. The memory consumption gets rounded to a next multiple of 8 or 16 bytes (IIRC, that's configurable), so the cases when there are real saving are rather rare.
This class would be slightly easier to use if the corresponding Calendar methods returned a byte. But there are no such Calendar methods, only get(int) which must returns an int because of other fields. Each operation on smaller types promotes to int, so you need a lot of casting.
Most probably, you'll either give up and switch to an int or write setters like
void setDayOfWeek(int dayOfWeek) {
this.dayOfWeek = checkedCastToByte(dayOfWeek);
}
Then the type of DAY_OF_WEEK doesn't matter, anyway.
Using variables smaller than the bus size of the CPU means more cycles are necessary. For example when updating a single byte in memory, a 64-bit CPU needs to read a whole 64-bit word, modify only the changed part, then write back the result.
Also, using a smaller data type requires overhead when the variable is stored in a register, since the behavior of the smaller data type to be accounted for explicitly. Since the whole register is used anyways, there is nothing to be gained by using a smaller data type for method parameters and local variables.
Nevertheless, these data types might be useful for representing data structures that require specific widths, such as network packets, or for saving space in large arrays, sacrificing speed.

need a larger primitive type for integers

I have a java program that multiplies large numbers. The output is a long, but it's too small, and the output is 0. I don't think there is a larger primitive type.
The way I could do it is to use the BigInteger class, but then I'd have to use it's multiply() and divide() methods rather than the regular * and /, which would be very inconvenient.
Here is the line of code:
System.out.printf("c) In how many of the arrangements in part (a) are all the vowels adjacent?%n " + "(7! / (2!2!))(6! / 3!2!) = " + (new Factorial(7).r / new Power(new Factorial(2).r,2).r) * (new Factorial(6).r / (new Factorial(6).r * new Factorial(2).r)) + "%n");
It's using my Factorial and Power classes, and it's too large.
Is there a longer number class that can still use * and /? Or is there another number class that'll be easier to use?
thanks.
Primitive types (byte, short, int, long) are classic types that are usually -- in other languages (think C, C++) -- defined by processor architecture. So a long would be either 16-bits, 32-bits or 64-bits depending on your CPU type.
Now Java changed this and fixed this types to certain lengths disregardless of processor architecture. See more info here.
If you need to work with larger numbers you're only left with BigInteger, BigDecimal and similar. Mind you: these types do not use CPU instructions but do mathematical operations "by hand", which means they are quite (think 1000x times) slower.
On the other hand, what you are talking about -- using * and / with these types is called operator overloading. Some programming languages support it, but in Java it's a big no-no. There are some precompilers (see JFront) but last I heard it doesn't work with BigInteger.
You could write your program in Groovy or Scala -- these both run on JVM (Groovy's syntax is quite similar to Java -- mostly just renaming .java to .groovy works) and both support operator overloading for BigInteger.
You could try using double, which is bigger than a long, and use only the integer part.
Otherwise you could create your own integer class, which will contain two or three longs,
then you use shift and bitwise operators to handle mathematical operations, but it's not easy.
The last solution may be to write c++ code using a very big integer and use the generated exe with java, but this solution would break code portability.

Higher precision doubles and trigonometric functions in Java

According to IEEE the following doubles exist:
Mantissa Exponent
double 64bit: 52 bit 11 bit
double 80bit: 64 bit 15 bit
In Java only the 64bit double can be directly
stored in an instance variable. I would like for
whatever reason work with 80bit floats as defined
above in Java. I am interested in the full set
of arithmetic functions, I/O and trigonometric
functions. How could I do that?
One could of course do something along the
following lines:
public class DoubleExt {
private long mantissa;
private short exponent;
}
And then make a package that interfaces with
some of the known C libs for 80bit floats.
But would this be considered the best practice?
What about supporting a couple of plattforms
and architectures?
Bye
I'm pretty sure primitives won't get you there, but the BigDecimal class is as good as it gets (for everything except trigonometry).
For trigonometric functions, however, you will have to resort to an external library, like APFloat (see this previous question).
Perhaps BigDecimal is an adequate way for you. But I believe it doesn't provide the full set of mathematic functions.
http://download.oracle.com/javase/1,5.0/docs/api/java/math/BigDecimal.html
The question is already 5 years old. Time to look around
whether there are some new candidates, maybe draw inspiraction
from other languages.
In Racket we find a BigFloat data type:
https://docs.racket-lang.org/math/bigfloat.html
The underlying library is GNU MPFR, a Java interface was started:
https://github.com/kframework/mpfr-java
Is this the only interface so far?

"Simulating" a 64-bit integer with two 32-bit integers

I'm writing a very computationally intense procedure for a mobile device and I'm limited to 32-bit CPUs. In essence, I'm performing dot products of huge sets of data (>12k signed 16-bit integers). Floating point operations are just too slow, so I've been looking for a way to perform the same computation with integer types. I stumbled upon something called Block Floating Point arithmetic (page 17 in the linked paper). It does a pretty good job, but now I'm faced with a problem of 32 bits just not being enough to store the output of my calculation with enough precision.
Just to clarify, the reason it's not enough precision is that I would have to drastically reduce precision of each of my arrays' elements to get a number fitting into a 32-bit integer in the end. It's the summation of ~16000 things that makes my result so huge.
Is there a way (I'd love a reference to an article or a tutorial) to use two 32-bit integers as most significant word and least significant word and define arithmetic on them (+, -, *, /) to process data efficiently? Also, are there perhaps better ways of doing such things? Is there a problem with this approach? I'm fairly flexible on programming language I use. I would prefer C/C++ but java works as well. I'm sure someone has done this before.
I'm pretty sure that the JVM must support a 64-bit arithmetic long type, and if the platform doesn't support it, then the VM must emulate it. However, if you can't afford to use float for performance problems, then a JVM will probably destroy you.
Most C and C++ implementations will provide 64-bit arithmetic emulated for 32bit targets- I know that MSVC and GCC do. However, you should be aware that you can be talking about many integer instructions to save a single floating-point instruction. You should consider that the specifications for this program are unreasonable, or perhaps that you could free performance from somewhere else.
Yes, just use 64 bit integers:
long val; // Java
#include <stdint.h>
int64_t val; // C
There is a list of libraries on the wikipedia page about Arbitrary Precision Arithmetic. Perhaps something on there would work for you?
If you can use Java, the short answer is: Use Java long's. The Java standard defines a long as 64 bits. Any JVM should implement this or it is not compliant with the standard. Nothing requires the CPU to support 64-bit arithmetic. If it's not natively supported, a JVM should implement it with software.
If you really have some crippled Java that does not support long's, use BigInteger. This handles integers of any arbitrarily-large size.
Talking about C/C++.
Any normal compiler would support "long long" type as 64-bit integrs with all normal arithmetic.
Combined with -O3, it gets very good chances of outputting best possible code for 64-bit arithemtic on your platform.

Categories

Resources