Implement Matrix operations in Java or in C++ through JNI? - java

I am basically trying to accomplish MNA(Modified Nodal Analysis) in java for circuit solvers . They basically involve solving a ton of linear equations and so I ended up with matrix algebra . MTJ and couple of other Java libraries are great but i'm tasked with implementing it on my own and did it as well in Java since my entire project is in Java. I was wondering if I should go ahead with the Java implementation or would doing it in C++ through JNI give a better enough performance to warrant its implementation? I'm just concerned about the bottleneck that JNI would incur when passing matrices the order of ten thousand and above or would that not be a problem?

My strongest recommendation to you is to not attempt to optimise your code for performance as you develop it. It is almost impossible to know ahead of time what code needs optimising. Generally you would end up with less readable, maintainable code that doesn't perform any better.
Develop your library in Java aiming for maximum clarity and correctness. Ignore performance.
Benchmark performance against realistic loads. For example if you will need to process millions of matrices then test how long that will take.
Decide if you have a problem. Modern hardware coupled with all the performance elements of the JRE mean that there are far fewer situations in which anything needs to be done at this stage. If something needs to be done, consider running on a more powerful machine instead of optimising your code. That's often a cheaper option.
If you need to optimise your code, use a profiler to find bottlenecks. Generally there are only a few small areas that consume the majority of the resources. You can waste a lot of time optimising code that has very little impact.
Optimise the code in those bottlenecks. There are a ton of good resources out there to help you with this. Rerun the benchmarks regularly to make sure you a making a difference. Unwind optimisations that turn out to make no difference.

The fastest matrix operations will come from an optimized BLAS and LAPACK tailored to your platform. You could link to those via JNI if you need the speed, but don't try to do your own matrix package if you need speed. These standards are heavily optimized by people familiar with the quirks of individual HW systems. If you're not sure that you need the speed, use Jama or some other Java-based package first.
Also note that if you go down to BLAS / LAPACK, you're probably getting Fortran code at the bottom, not C, C++, or Java.

Related

Why Python is not better in multiprocessing or multithreading applications than Java?

Since Python has some issues with GIL, Java is better for developing multiprocessing applications. Could you please justify the exact reasoning of java's effective processing than python in your way?
The biggest problem in multithreading in CPython is the Global Interpreter Lock (GIL) (note that other Python implementations don't necessarily share this problem!)
The GIL is an implementation detail that effectively prevents parallel (simultaneous) execution of separate threads in Python. The problem is that whenever Python byte code is to be executed, then the current thread must have acquired the GIL and only a single thread can have the GIL at any given moment.
So if 5 threads are trying to execute some Python byte code, then they will effectively run interleaved, because each one will have to wait for the GIL to become available. This is not usually a problem with single-core computers, as the physical constraints have the same effect: only a single thread can run at a time.
In multi-core/SMP computers, however this becomes a bottleneck. These days almost everthing is running on multiple cores, including effectively all smartphones and even many embedded systems.
Java has no such restrictions, so multiple threads can execute at the exact same time.
I would disagree that Python is not better than Java for Multi-Processing application.
First, I am assuming that the OP is using 'better' to mean 'faster code execution' as far as I can tell.
I suffer from 'speed-freak' syndrome, probably from having come from a C/ASM background, so I have spent considerable time getting to the bottom of the "is Python slow?" issue.
The simple answer to that? "It can be." Here's some important points:
1) With a multi-threadded application, Python is going to have a disadvantage to any language that doesn't have something similar to the GIL. The GIL is an artifact of the Python VM in CPython, not the Python language itself. Some Python VM's like Jython, IronPython, etc do not have a GIL.
2) In a Multi-Process application, the GIL doesn't really apply, and thus you can now start to harness faster execution of your Python code unmolested for the most part by the GIL. I strongly suggest if you want to write large Python code that needs both speed and concurrency, that you learn about Multi-Processing, and possibly ZMQ/0MQ for message passing.
3) Regardless of the GIL, Java displays faster code execution than Python in many areas. This is due to native differences in how Python handles objects in memory:
A number of Python functions create copies of objects in memory rather than modifying them ( see http://www.skymind.com/~ocrow/python_string/ for examples)
Python uses Dict to store attributes for objects, etc. I don't want to distract and delve into these areas, but I can generally say that some of the 'neat' things that Python can do come at a speed cost. It's also important to know that there are ways around the default behaviour if that is causing too high of a speed penalty for you.
4) Some of Java's speed advantage is due to more optimization in the Java VM over Python as far as I can tell. Once you eliminate the differences in how much behind-the-scenes memory/object work is done, Java can often still beat Python. Is it because Java has had more attention than Python? I'm not sure, with enough funding I feel that CPython could be faster.
Check http://c2.com/cgi/wiki?PythonProblems for more discussion on some of these issues.
I will say that I have decided to embrace Python nearly 100% going forward with new code.
Don't fall into the premature optimization trap, and remember you can always call C code in a pinch. Make your code work well, make it maintainable, then start to optimize once the speed of the application isn't fast enough for your needs.
Interesting Benchmarks:
http://benchmarksgame.alioth.debian.org/u64/python.php
Further information about Python speed issues can be found here:
http://www.infoworld.com/d/application-development/van-rossum-python-not-too-slow-188715

Making a loop efficient in Matlab code: use C or will Java do?

I want to speedup some matlab code involving a loop. A common solution is to code the loop in C and call it from matlab. However, I was wondering if I can get similar benefits from implementing the loop in Java instead - perhaps the just-in-time compilation makes it faster?
Before you start working with outside code: Have you pre-allocated your variables? Can you vectorize your loop? While the Matlab just-in-time compiler has become a lot better over the years, there are still cases where vectorization brings significant improvement. Also, note that quite a few Matlab functions (those for which you don't see code when you open them in the editor) are implemented in C or Fortran, so you may not observe a dramatic speed gain.
If you can't speed up your Matlab code by better writing it in Matlab, and if it does look likely that reimplementing might actually bring you any benefit, then C might be fastest, though Java might be not too far behind (again it depends on the code you want to speed up - it might be a good idea if you posted it here). If you're much more familiar with Java than C, I suggest trying to go the Java route.
People on SO are always eager to help on optimizing code. Once you've spotted the time consuming code section by profiling, you might publish a code extract here.
One phenomenal MATLAB feature is its ability to do JAVA scripting. Write your 'optimized' code in JAVA and instantiate the class within MATLAB. Using C your forced to write a wrapper, which is not as seamless.

Which programming language for compute-intensive trading portfolio simulation?

I am building a trading portfolio management system that is responsible for production, optimization, and simulation of non-high frequency trading portfolios (dealing with 1min or 3min bars of data, not tick data).
I plan on employing Amazon web services to take on the entire load of the application.
I have four choices that I am considering as language.
Java
C++
C#
Python
Here is the scope of the extremes of the project scope. This isn't how it will be, maybe ever, but it's within the scope of the requirements:
Weekly simulation of 10,000,000 trading systems.
(Each trading system is expected to have its own data mining methods, including feature selection algorithms which are extremely computationally-expensive. Imagine 500-5000 features using wrappers. These are not run often by any means, but it's still a consideration)
Real-time production of portfolio w/ 100,000 trading strategies
Taking in 1 min or 3 min data from every stock/futures market around the globe (approx 100,000)
Portfolio optimization of portfolios with up to 100,000 strategies. (rather intensive algorithm)
Speed is a concern, but I believe that Java can handle the load.
I just want to make sure that Java CAN handle the above requirements comfortably. I don't want to do the project in C++, but I will if it's required.
The reason C# is on there is because I thought it was a good alternative to Java, even though I don't like Windows at all and would prefer Java if all things are the same.
Python - I've read somethings on PyPy and pyscho that claim python can be optimized with JIT compiling to run at near C-like speeds... That's pretty much the only reason it is on this list, besides that fact that Python is a great language and would probably be the most enjoyable language to code in, which is not a factor at all for this project, but a perk.
To sum up:
real time production
weekly simulations of a large number of systems
weekly/monthly optimizations of portfolios
large numbers of connections to collect data from
There is no dealing with millisecond or even second based trades. The only consideration is if Java can possibly deal with this kind of load when spread out of a necessary amount of EC2 servers.
Thank you guys so much for your wisdom.
Pick the language you are most familiar with. If you know them all equally and speed is a real concern, pick C.
While I am a huge fan of Python and personaly I'm not a great lover of Java, in this case I have to concede that Java is the right way to go.
For many projects Python's performance just isn't a problem, but in your case even minor performance penalties will add up extremely quickly. I know this isn't a real-time simulation, but even for batch processing it's still a factor to take into consideration. If it turns out the load is too big for one virtual server, an implementation that's twice as fast will halve your virtual server costs.
For many projects I'd also argue that Python will allow you to develop a solution faster, but here I'm not sure that would be the case. Java has world-class development tools and top-drawer enterprise grade frameworks for parallell processing and cross-server deployment and while Python has solutions in this area, Java clearly has the edge. You also have architectural options with Java that Python can't match, such as Javaspaces.
I would argue that C and C++ impose too much of a development overhead for a project like this. They're viable inthat if you are very familiar with those languages I'm sure it would be doable, but other than the potential for higher performance, they have nothing else to bring to the table.
C# is just a rewrite of Java. That's not a bad thing if you're a Windows developer and if you prefer Windows I'd use C# rather than Java, but if you don't care about Windows there's no reason to care about C#.
I would pick Java for this task. In terms of RAM, the difference between Java and C++ is that in Java, each Object has an overhead of 8 Bytes (using the Sun 32-bit JVM or the Sun 64-bit JVM with compressed pointers). So if you have millions of objects flying around, this can make a difference. In terms of speed, Java and C++ are almost equal at that scale.
So the more important thing for me is the development time. If you make a mistake in C++, you get a segmentation fault (and sometimes you don't even get that), while in Java you get a nice Exception with a stack trace. I have always preferred this.
In C++ you can have collections of primitive types, which Java hasn't. You would have to use external libraries to get them.
If you have real-time requirements, the Java garbage collector may be a nuisance, since it takes some minutes to collect a 20 GB heap, even on machines with 24 cores. But if you don't create too many temporary objects during runtime, that should be fine, too. It's just that your program can make that garbage collection pause whenever you don't expect it.
Why only one language for your system? If I were you, I will build the entire system in Python, but C or C++ will be used for performance-critical components. In this way, you will have a very flexible and extendable system with fast-enough performance. You can find even tools to generate wrappers automatically (e.g. SWIG, Cython). Python and C/C++/Java/Fortran are not competing each other; they are complementing.
Write it in your preferred language. To me that sounds like python. When you start running the system you can profile it and see where the bottlenecks are. Once you do some basic optimisations if it's still not acceptable you can rewrite portions in C.
A consideration could be writing this in iron python to take advantage of the clr and dlr in .net. Then you can leverage .net 4 and parallel extensions. If anything will give you performance increases it'll be some flavour of threading which .net does extremely well.
Edit:
Just wanted to make this part clear. From the description, it sounds like parallel processing / multithreading is where the majority of the performance gains are going to come from.
It is useful to look at the inner loop of your numerical code. After all you will spend most of your CPU-time inside this loop.
If the inner loop is a matrix operation, then I suggest python and scipy, but of the inner loop if not a matrix operation, then I would worry about python being slow. (Or maybe I would wrap c++ in python using swig or boost::python)
The benefit of python is that it is easy to debug, and you save a lot of time by not having to compile all the time. This is especially useful for a project where you spend a lot of time programming deep internals.
I would go with pypy. If not, http://lolcode.com/.

Java performance in numerical algorithms

I am curious about performance of Java numerical algorithms, say for example matrix matrix double precision multiplication, using the latest JIT machines as compared for example to hand tuned SSE C++/assembler or Fortran counterparts.
I have looked on the web but most of the results come from almost 10 years ago and I understand Java progressed quite a lot since then.
If you have experience using Java for numerically intensive applications can you share your experience. Also how well does Java perform in kernels where the loops are relatively short and the memory access is not very uniform but still within the limits of L1 cache? If such kernel is executed multiple times in succession, can JVM optimize it during runtime?
Thanks
I have written some reasonably large and performance sensitive numerical code in Java (crunching large arrays of doubles usually).
I've found Java to be "good enough" for fast numerical calculations. Especially when you consider that you are usually not CPU-bound anyway - memory latency and cache awareness will probably be your biggest problem for large datasets.
However, you can still beat Java with hand-optimized C/C++ code that takes advantage of specific vectorised instructions etc. or highly customised memory layouts. So for the very fastest code, you could consider writing the core algorithm in C/C++ and calling it from Java using JNI.
Personally, I find that creating a native code dependency is usually more trouble than it is worth so I tend to stick to the pure Java approach.
This is coming from a .NET side of things, but I'm 90% sure that it's the case for Java too. While the JIT will make some use of SSE instructions where it can, it currently does not auto-vectorize your code when dealing with, for example, matrix multiplications. Hand vectorized C++ using compiler intrinsics/inline assembly will definitely be faster here.
One of the weakest points in java is (native) matrix operations. This is due to the nature of Java matrices:
You can not declare a matrix to be rectangular, ie. each row can have a different number of columns.
A matrix is technically not a "matrix of doubles (or ints, ...)", but an array of arrays of ... . The big difference is that since arrays are Java objects you can assign the same array object to more than 1 row.
These two properties make a lot of standard matrix optimizations impossible for the compiler.
You might get better performance by using a Java library which emulates matrices on a single long array. However you have the overhead of method calls for all access.
C++ will definitely be faster. You can even have some hand-optimized libraries for your purposes that contain assembly codes for each of the major CPUs out there. You can't get better than that.
Afterwards, you can use JNI to call to it from Java, if needed.
Java is not meant for high performance arithmetic calculations like this. If you are depending on these, I'd recommend picking a proper, low-level language to implement that. Or, alternatively, you can write the performance-specific part in a low level language, and then connect it to a Java front-end using JNI or some other IPC method.
This is a link to the programming language shootout page for java vs c++, which will give you a comparison of java's speed on several compute intensive algorithms. It will also show you what highest performance java code looks like. For the most part, for these few specific benchmarks, java took more time (but not more than 2 or 3 times) to run.
Seconding that your best bet is to test it for yourself, as performance will vary somewhat depending on what you're doing exactly. I find it difficult to believe Shane C. Mason's answer that Java performance will be the same as C++ or Fortran performance, as even C++ and Fortran are not really comparable for some scientific computing algorithms.
I have a computational fluid dynamics code that I wrote using C++ and the same code essentially translated into Fortran. I'm not really sure why yet, but the Fortran version is about twice as fast as the C++ version. I would guess that with features like bounds-checking and garbage collection, Java would be slower than both, but I would not know until I tested.
This can be so dependent on what you are doing in the C++ code.
For example, are you using the GPU? Edit I forgot about jogl, so Java can compete here.
Are you parallelized using STM or shared-memory, then Java can't compete.
For a link on analysis of parallel matrix multiplication: http://www.cs.utexas.edu/users/plapack/papers/ipps98/ipps98.html
Do you have enough memory to do the calculations in memory, so the garbage collector won't be needed, and have you fine-tuned the garbage collector for optimal performance? Then, Java can be competitive, perhaps.
Are you using multicores, and is the C++ optimized to take advantage of this architecture? Then Java won't be able to compete.
Are you using several computers tied together, then Java won't be able to compete.
Are you using any combination of these, then it will depend on the particular implementation.
Java is not designed to compete with a hand-tuned C++ program, but, the time it takes to do the tuning, are you doing enough calculations where it will matter? Java will be able to give some reasonable speed but with less work than hand-tuning, but not much of an improvement over just doing C++ code.
You may want to see if there is an improvement over Haskell or Erlang, for example, over your C++, as these languages are better designed for this type of work.
Are these kind of computations you're interested in - Fast Fourier Transform, Jacobi Successive Over Relaxation, Monte Carlo Integration, Sparse Matrix Mult, Dense LU Matrix Factorisation?
They make up the SciMark 2.0 composite benchmark which you can launch as an applet on your machine.
There are also ANSI C versions of the programs, and an Intel document (pdf) on optimizing and recompiling SciMark for C++.
Similarly you could use The Java Grande Forum Benchmark Suite and the comparison C programs.
Java uses a Just in Time (JIT) compiler to convert the bytecode to native machine language - so the first time it runs through a code block it will be slower but once the segment is 'warmed up' the performance will be equivalent. In short - the numerical performance is pretty good.

C++/Java Performance for Neural Networks?

I was discussing neural networks (NN) with a friend over lunch the other day and he claimed the the performance of a NN written in Java would be similar to one written in C++. I know that with 'just in time' compiler techniques Java can do very well, but somehow I just don't buy it. Does anyone have any experience that would shed light on this issue? This page is the extent of my reading on the subject.
The Hotspot JIT can now produce code faster than C++. The reason is run-time empirical optimization.
For example, it can see that a certain loop takes the "false" branch 99% of the time and reorder the machine code instructions accordingly.
There's lots of articles about this. If you want all the details, read Sun's excellent whitepaper. For more informal info, try this one.
I'd be interested in a comparison between Hotspot JIT and profile-guided optimization optimized C++.
The problem I see with the Hotspot JIT (and any runtime-profile-optimized JIT compiler) is that statistics must be kept and code modified. While there are isolated cases this will result in faster-running code, I doubt that profile-optimized JIT compilers will run faster than well optimized C or C++ code in most circumstances. (Of course I could be wrong.)
Anyway, usually you're going to be at the mercy of the larger project, using the same language it is written in. Or you'll be at the mercy of the knowledge base of your co-workers. Or you'll be at the mercy of the platform you are targetting (is a JVM available on the architecture you're targetting?). In the rare case you have complete freedom and you're familiar with both languages, do some comparisons with the tools you have at your disposal. That is really the only way to determine what's best.
The only possible answer is: make a prototype and measure for yourself. If my experience is of any interest, Java and C# were always much slower than C++ for the kind of work I was doing - I believe mostly because of the high memory consumption. Of course, you can come to a completely different conclusion.
This is not strictly about C++ vs Java performance but nonetheless interesting in that regard: A paper about the performance of programs running in a garbage collected environment.
If excessive garbage collection is a concern, you can always reuse unused high-churn objects.
Create a factory that keeps a queue of SoftReferences to recycled objects, using those before creating new objects. Then in code that uses these objects, explicitly return these objects to the factory for recycling.
Probably C++, although I believe you'll hardly notice the difference besides a slow startup time. Java however makes development faster and maintenance easier.
In the grand scheme of things, you're debating maybe a 5% performance difference where you'd get several orders of magnitude increase by moving to CUDA or dedicated hardware.

Categories

Resources