JVM Arbitrary Precision Libraries

JVM Arbitrary Precision Libraries - java

I'm working on a project ( in Scala ), where I have a need to manipulate some very large numbers; far too big to be represented by the integral types. Java provides the BigInteger and BigDecimal classes (and scala provides a nice thin wrapper around them). However, I noticed that these libraries are substantially slower than other arbitrary precision libraries that I've used in the past (i.e. http://www.ginac.de/CLN/), and the speed difference seems larger than what can be attributed to the language alone.
I did some profiling of my program, and 44% of the execution time is being spent in the BigInteger multiply method. I'd like to speed up my program a bit, so I'm looking for a faster and more efficient option than the BigInteger class (and its Scala wrapper). I've looked at LargeInteger ( from JScience) and Aint (from Afloat). However, both seem to perform more slowly than the standard BigInteger class.
Does anyone know of a Java (or available on the JVM) arbitrary precision math library with a focus on high performance integer multiplication and addition?

I'm a bit late...well I only know the apfloat library, available in both C++ and Java.
Apfloat-Library:

Unfortunately, I think you are out of luck for a Java native library. I have not found one. I recommend wrapping GMP, which has excellent arbitrary precision performance, using JNI. There is JNI overhead, but if you're in the 1500 digit range, that should be small compared to the difference in algorithmic complexity. You can find various wrappings of GMP for Java (I believe the most popular one is here).

Related

Replace Java math operators with BigDecimal equivalents

I have a program that works with snippets of Java code that do math on doubles using the standard mathematical operators, like this:
double someVal = 25.03;
return (someVal * 3) - 50;
For Reasons (mostly rounding errors) I would like to change all these snippets to use BigDecimal instead of double, modifying the math functions along the way, like this:
MathContext mc = MathContext.DECIMAL32;
BigDecimal someVal = new BigDecimal("25.03", mc);
return someVal.multiply(BigDecimal.valueOf(3), mc).subtract(BigDecimal.valueOf(50), mc);
The snippets are mostly pretty simple, but I would prefer to avoid a fragile solution (eg, regex) if I can. Is there a relatively straightforward way to do this?
Note I want to have a program or code perform these modifications (metaprogramming). Clearly I'm capable of making the changes by hand, but life is too short.

You could try Google's "Refaster", which, according to the paper, is "a tool that uses normal, compilable before-and-after examples of Java code to specify a Java refactoring."
The code lives under core/src/main/java/com/google/errorprone/refaster in Google's error-prone github project. (It used to live in its own github project.)
This is more of a hint than an answer, though, since I've never worked directly with Refaster and don't know how well it does on primitive expressions like the ones in your example. I also don't know how well-suited it is for toolchain use like yours, vs. one-time refactors (which is how Google tends to use it). But it's actively maintained, and I've seen it used really effectively in the past.

We use BigDecimal for financial calculation. As per other comments, you going to have some performance degradation and the code will be very hard to read. The performance impact depends on how many operation you going to have. Usually, you facing rounding issues with doubles when your calculation chain is long. You wouldn't have many issues if you do c=a+b but will if you have c+=a+b million times. And with a thousand of operations, you will notice how bigDecimal are slower than double, so do performance testing.
Be careful when changing your code especially with the division, you will have to specify rounding mode and scale of the result, this what people usually wouldn't do and it leads to the errors.
I assume it not only about replacing calculation logic but also you'll need to change your domain model so I doubt you can come up with a script to do it in a reasonable time, so do it by hands. Good IDE will help you a lot.
No matter how you going to convert your code I suggest to firstly make sure that all your calculation logic covered by unit tests and do unit test conversion before changing the logic. I.e replace assertion of the values by wrapping them with bigDecimals. In that case, you will avoid silly typing/algorithm mistakes.
I would not answer your question how to convert from double to BigDecimal just want to share some notes regaridng to the

Don't do this.
It's a huge readability hit. Your example turned "25.03 * 3 - 50" into 4 lines of code.
Financial code usually uses double, or long for cents. It's precise enough that it's just a question of proper programming to avoid rounding errors: What to do with Java BigDecimal performance?
It's, likely, a huge performance hit, in particular with erratic garbage collections, which is not acceptable for HFT and the like: https://stackoverflow.com/a/1378156/1339987
I don't know how much code you are talking about but I do expect you will have to make lots of small decisions. This reduces the chance there is an openly available tool (which I'm all but sure does not exist) and increases the amount of configuration work you would have to do should you find one.
This will introduce bugs for the same reason you expect it to be an improvement, unless you have extremely good test coverage.
Programmatically manipulating Java source is covered here: Automatically generating Java source code
You don't need to accept this answer but I believe my advice is correct and think other readers of this question need to see up-front the case for not making this transformation, which is my rationale for posting this somewhat nonanswer.

More bits versus less bits in programming

So I'm somewhat new to programming and I'm been curious about the many data types in Java. So to start, I've been focusing mostly on the ones to do with numbers.
Specifically, I've been looking at int, long. I've noticed that longs can have a much larger range of values than integers. Because of this, I've just been wondering why we don't just use longs all the time, instead of using integers most often.

Yes, more bits take up more memory... also some data types are faster for computers to do math on than others (eg: integer math is faster than floating point math)

Yes, that's essentially it; there are several flavors of "integer" type a and several flavors of "decimal" (floating-point) type. It's hard for people just starting out in programming to believe it, with hardware being so cheap, but there was a time that the difference between these types was the difference between fitting in the computer's memory or not. Nowadays the only time you still have those sorts of constraints would be enterprise-level systems or something that's still trying like an embedded system or minimal computer (though even the Raspberry Pi is outgrowing this classification).
Still, is good practice to limit yourself to the smallest reasonable variant of the data type your using. Memory is still an issue at the scale of, say, an older mobile device, or one running lots of apps. long is super-crazy-long for most common contexts, and the extra room is just wasted resources if you're not going to be dealing with numbers at that scale.

Increase Java's Computational Abilities

So I'm not sure exactly if this is a problem of Eclipse, Java, or my computer. What I'm trying to do is basically 2^57885161-1. But, sadly, all that Eclipse outputs is "Infinity". My deduction is either that Java sets a limit to the maximum of a computed equation, Eclipse does, or that my Computer cannot handle the amount of computational ability it would require.
If it is Java or Eclipse, is there a way that I can remedy the situation?
Thank you.

Use the java.math.BigDecimal (or java.math.BigInteger) class for extremely large numbers.
What's probably happening is that you're using an int or a double, and that number is MUCH too large for those datatypes in Java. Using BigDecimal, which can be arbitrarily long, will solve your problem given time.
edit - previously I had written "java.util.BigDecimal", which is the wrong package.

Where to code this heuristic?

I want to ask a complex question.
I have to code a heuristic for my thesis. I need followings:
Evaluate some integral functions
Minimize functions over an interval
Do this over thousand and thousand times.
So I need a faster programming language to do these jobs. Which language do you suggest? First, I started with Java, but taking integrals become a problem. And I'm not sure about speed.
Connecting Java and other softwares like MATLAB may be a good idea. Since I'm not sure, I want to take your opinions.
Thanks!

C,Java, ... are all Turing complete languages. They can calculate the same functions with the same precision.
If you want achieve performance goals use C that is a compiled and high performances language . Can decrease your computation time avoiding method calls and high level features present in an interpreted language like Java.
Anyway remember that your implementation may impact the performances more than which language you choose, because for increasing input dimension is the computational complexity that is relevant ( http://en.wikipedia.org/wiki/Computational_complexity_theory ).

It's not the programming language, it's probably your algorithm. Determine the big0 notation of your algorithm. If you use loops in loops, where you could use a search by a hash in a Map instead, your algorithm can be made n times faster.
Note: Modern JVM's (JDK 1.5 or 1.6) compile Just-In-Time natively (as in not-interpreted) to a specific OS and a specific OS version and a specific hardware architecture. You could try the -server to JIT even more aggressively (at the cost of an even longer initialization time).
Do this over thousand and thousand times.
Are you sure it's not more, something like 10^1000 instead? Try accurately calculating how many times you need to run that loop, it might surprise you. The type of problems on which heuristics are used, tend to have a really big search space.

Before you start switching languages, I'd first try to do the following things:
Find the best available algorithms.
Find available implementations of those algorithms usable from your language.
There are e.g. scientific libraries for Java. Try to use these libraries.
If they are not fast enough investigate whether there is anything to be done about it. Is your problem more specific than what the library assumes. Are you able to improve the algorithm based on that knowledge.
What is it that takes so much/memory? Is this realy related to your language? Try to avoid observing JVM start times instead of the time it performed calculation for you.
Then, I'd consider switching languages. But don't expect it to be easy to beat optimized third party java libraries in c.

Order of the algorithm
Tipically switching between languages only reduce the time required by a constant factor. Let's say you can double the speed using C, but if your algorithm is O(n^2) it will take four times to process if you double the data no matter the language.
And the JVM can optimize a lot of things getting good results.
Some posible optimizations in Java
If you have functions that are called a lot of times make them final. And the same for entire classes. The compiler will know that it can inline the method code, avoiding creating method-call stack frames for that call.

C/C++ versus Java/C# in high-performance applications

My Question is regarding the performance of Java versus compiled code, for example, C++/fortran/assembly in high-performance numerical applications.
I know this is a contentious topic, but I am looking for specific answers/examples. Also community wiki. I have asked similar questions before, but I think I put it broadly and did not get answers I was looking for.
double precision matrix-matrix multiplication, commonly known as dgemm in blas library, is able to achieve nearly 100 percent peak CPU performance (in terms of floating operations per second).
There are several factors which allow achieving that performance:
cache blocking, to achieve maximum memory locality
loop unrolling to minimize control overhead
vector instructions, such as SSE
memory prefetching
guarantee no memory aliasing
I have have seen lots of benchmarks using assembly, C++, Fortran, Atlas, vendor BLAS (typical cases are a matrix of dimension 512 and above).
On the other hand, I have heard that the principle byte compiled languages/implementations such as Java can be fast or nearly as fast as machine compiled languages. However, I have not seen definite benchmarks showing that it is so. On the contrary, it seems (from my own research) byte compiled languages are much slower.
Do you have good matrix-matrix multiplication benchmarks for Java/C #?
does just-in-time compiler (actual implementation, not hypothetical) able to produce instructions which satisfy points I have listed?
Thanks
with regards to performance:
every CPU has peak performance, depending on the number of instructions processor can execute per second. For example, modern 2 GHz Intel CPU can achieve 8 billion double precision add/multiply a second, resulting in 8 gflops peak performance. Matrix-matrix multiply is one of the algorithms which is able to achieve nearly full performance with regards number of operations per second, the main reason being a higher ratio of computing over memory operations (N^3/N^2). Numbers I am interested in a something on the order N > 500.
with regards to implementation: higher-level details such as blocking is done at the source code level. Lower-level optimization is handled by the compiler, perhaps with compiler hints with regards to alignment/alias. Byte compiled implementation can be written using block approach as well, so in principle source code details for decent implementation will be very similar.

A comparison of VC++/.NET 3.5/Mono 2.2 in a pure matrix multiplication scenario:
Source
Mono with Mono.Simd goes a long way towards closing the performance gap with the hand-optimized C++ here, but the C++ version is still clearly the fastest. But Mono is at 2.6 now and might be closer and I would expect that if .NET ever gets something like Mono.Simd, it could be very competitive as there's not much difference between .NET and the sequential C++ here.

All factors your specify is probably done by manual memory/code optimization for your specific task. But JIT compiler haven't enough information about your domain to make code optimal as you make it by hand, and can apply only general optimization rules. As a result it will be slower that C/C++ matrix manipulation code (but can utilize 100% of CPU, if you want it :)

Addressing the SSE issue: Java is using SSE instructions since J2SE 1.4.2.

in a pure math scenario (calculating 25 types or algebraic surfaces 3d coords) c++ beats java in a 2.5 ratio

Java cannot compete to C in matrix multiplications, one reason is that it checks on each array access whether the array bounds are exceeded. Further Java's math is slow, it does not use the processor's sin(), cos().

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.