Java vs Python on Hadoop

Java vs Python on Hadoop - java

I am working on a project using Hadoop and it seems to natively incorporate Java and provide streaming support for Python. Is there is a significant performance impact to choosing one over the other? I am early enough in the process where I can go either way if there is a significant performance difference one way or the other.

With Python you'll probably develop faster and with Java will definitely run faster.
Google "benchmarksgame" if you want to see some very accurate speed comparisons between all popular languages, but if I recall correctly you're talking about 3-5x faster.
That said, few things are processor bound these days, so if you feel like you'd develop better with Python, have at it!
In reply to comment (how can java be faster than Python):
All languages are processed differently. Java is about the fastest after C & C++ (which can be as fast or up to 5x faster than java, but seems to average around 2x faster). The rest are from 2-5+ times slower. Python is one of the faster ones after Java. I'm guessing that C# is about as fast as Java or maybe faster, but the benchmarksgame only had Mono (which was a tad slower) because they don't run it on windows.
Most of these claims are based on the computer language benchmarks game which tends to be pretty fair because advocates of/experts in each language tweak the test written in their specific language to ensure the code is well-targeted.
For example, this shows all tests with Java vs c++ and you can see the speed ranges from about equal to java being 3x slower (first column is between 1 and 3), and java uses much more memory!
Now this page shows java vs python (from the point of view of Python). So the speeds range from python being 2x slower than Java to 174x slower, python generally beats java in code size and memory usage though.
Another interesting point here--tests that allocated a lot of memory, Java actually performed significantly better than Python in memory size as well. I'm pretty sure java usually loses memory because of the overhead of the VM, but once that factors out, java is probably more efficient than most (again, except the C's).
This is Python 3 by the way, the other python platform tested (Just called Python) faired much worse.
If you really wanted to know how it is faster, the VM is amazingly intelligent. It compiles to machine language AFTER running the code, so it knows what the most likely code paths are and optimizes for them. Memory allocation is an art--really useful in an OO language. It can perform some amazing run-time optimizations which no non-VM language can do. It can run in a pretty small memory footprint when forced to, and is a language of choice for embedded devices along with C/C++.
I worked on a Signal Analyzer for Agilent (think expensive o-scope) where nearly the entire thing (aside from the sampling) was done in Java. This includes drawing the screen including the trace (AWT) and interacting with the controls.
Currently I'm working on a project for all future cable boxes. The Guide along with most other apps will be written in Java.
Why wouldn't it be faster than Python?

Java is less dynamic than Python and more effort has been put into its VM, making it a faster language. Python is also held back by its Global Interpreter Lock, meaning it cannot push threads of a single process onto different core.
Whether this makes any significant difference depends on what you intend to do. I suspect both languages will work for you.

You can write Hadoop mapreduce transformations either as "streaming" or as a "custom jar". If you use streaming, you can write your code in any language you like, including Python or C++. Your code will just read from STDIN and output to STDOUT. However, on hadoop versions before 0.21, hadoop streaming used to only stream text - not binary - to your processes. Therefore your files needed to be text files, unless you do some funky encoding transformations yourself. But now it appears a patch has been added that now allows the use of binary formats with hadoop streaming.
If you use a "custom jar" (i.e. you wrote your mapreduce code in Java or Scala using the hadoop libraries), then you will have access to functions that allow you to input and output binary (serialize in binary) from your streaming processes (and save the results to disk). So future runs will be much faster (depending on how much your binary format is smaller than your text format).
So if your hadoop job is going to be I/O bound, then the "custom jar" approach will be faster (since both Java is faster as previous posters have shown and reading from disk will also be faster).
But you have to ask yourself how valuable is your time. I find myself far more productive with python, and writing map-reduce that reads STDIN and writes to STDOUT is really straightforward. So I personally would recommend going the python route - even if you have to figure the binary encoding stuff out yourself. Since hadoop 0.21 handles non-utf8 byte arrays, and since there is a binary (byte array) alternative to use for python (http://dumbotics.com/2009/02/24/hadoop-1722-and-typed-bytes/), which shows the python code only being about 25% slower than the "custom jar" java code, I would definitely go the python route.

Related

OpenCV (JavaCV) vs OpenCV (C/C++ interfaces)

I am just wondering whether there would be a significant speed performance advantage relatively on a given set of machines when using JavaCV as opposed to the C/C++ implementation of OpenCV.
Please correct me if I am wrong, but my understanding is that the c/c++ implementation of opencv is closer to the machine where as the Java implementation of OpenCV, JavaC, would have a slight speed performance disadvantage (in milliseconds) as there would be a virtual machine converting your source code to bytecode which then gets converted to machine code. Whereas, with c/c++, it gets converted straight to machine code and thus doesn't carry that intermediary step of the virtual machine overhead.
Please don't kill me here if I made mistakes; I am just learning and would welcome constructive criticism.
Thank you

I'd like to add a couple of things to #ejbs's answer.
First of all, you concerned 2 separate issues:
Java vs. C++ performance
OpenCV vs JavaCV
Java vs. C++ performance is a long, long story. On one hand, C++ programs are compiled to a highly optimized native code. They start quickly and run fast all the time without pausing for garbage collection or other VM duties (as Java do). On other hand, once compiled, program in C++ can't change, no matter on what machine they are run, while Java bytecode is compiled "just-in-time" and is always optimized for processor architecture they run on. In modern world, with so many different devices (and processor architectures) this may be really significant. Moreover, some JVMs (e.g. Oracle Hotspot) can optimize even the code that is already compiled to native code! VM collect data about program execution and from time to time tries to rewrite code in such a way that it is optimized for this specific execution. So in such complicated circumstances the only real way to compare performance of implementations in different programming languages is to just run them and see the result.
OpenCV vs. JavaCV is another story. First you need to understand stack of technologies behind these libraries.
OpenCV was originally created in 1999 in Intel research labs and was written in C. Since that time, it changed the maintainer several times, became open source and reached 3rd version (upcoming release). At the moment, core of the library is written in C++ with popular interface in Python and a number of wrappers in other programming languages.
JavaCV is one of such wrappers. So in most cases when you run program with JavaCV you actually use OpenCV too, just call it via another interface. But JavaCV provides more than just one-to-one wrapper around OpenCV. In fact, it bundles the whole number of image processing libraries, including FFmpeg, OpenKinect and others. (Note, that in C++ you can bind these libraries too).
So, in general it doesn't matter what you are using - OpenCV or JavaCV, you will get just about same performance. It more depends on your main task - is it Java or C++ which is better suited for your needs.
There's one more important point about performance. Using OpenCV (directly or via wrapper) you will sometimes find that OpenCV functions overcome other implementations by several orders. This is because of heavy use of low-level optimizations in its core. For example, OpenCV's filter2D function is SIMD-accelerated and thus can process several sets of data in parallel. And when it comes to computer vision, such optimizations of common functions may easily lead to significant speedup.

JavaCV interfaces to OpenCV, so when you call something OpenCV related there would be some overhead but in general most of the heavy work will still be on the C++ side and therefore there won't be a very large performance penalty.
You would have to do performance benchmarks to find out more.
PS. I'm pretty new here but I'm rather sure that this is not a suitable question for StackOverflow.

i would like to add a few more insights on java as an interface to c++ libraries...
A)developing:
1)while java may be easier to manage large scale projects and compiles extremely fast, it is very very hard, and next to impossible to debug native code from java...
when code crush on native side...or memory leaks( something that happens a lot... ) you feel kind of helpless...
2)unless you build the bindings yourself( not an easy task even with using swig or whatever... ) you are dependent on the good will/health/time of the bindings builder....
so in this case i would prefer the official "desktop java " bindings over javacv...
B)performance.
1) while bindings may be optimized( memory transfer using neobuffer ) as in the javacv case there is still a very very small jni overhead for each native function call -
this is meaningless in our case since most opencv functions consume X100000++ cpu cycles compared to this jni overhead...
2) The BIG-PROBLEM ---- stop the world GARBAGE COLLECTIOR( GC )
java uses a garbage collector that halts all cpu's threads making it UNSUITABLE for REAL-TIME application's , there are work-around's iv'e heard of like redesigning your app not to produce garbage, use a spaciel gc, or use realtime java( cost money... ) they all seem to be extra-work( and all you wanted is a nice easy path to opencv.... )
conclusion - if you want to create a professional real time app - then go with c++
unless you have a huge modular project to manage - just stick with c++ and precompiled headers( make things compile faster... )
while java is a pleasure to work with , when it comes to native binding's HELL breaks loose...i know iv'e been there....

Why is Erlang slower than Java on all these small math benchmarks?

While considering alternatives for Java for a distributed/concurrent/failover/scalable backend environment I discovered Erlang. I've spent some time on books and articles where nearly all of them (even Java addicted guys) says that Erlang is a better choice in such environments, as many useful things are out of the box in a less error prone way.
I was sure that Erlang is faster in most cases mainly because of a different garbage collection strategy (per process), absence of shared state (b/w threads and processes) and more compact data types. But I was very surprised when I found comparisons of Erlang vs Java math samples where Erlang is slower by several orders, e.g. from x10 to x100.
Even on concurrent tasks, both on several cores and a single one.
What's the reasons for that? These answers came to mind:
Usage of Java primitives (=> no heap/gc) on most of the tasks
Same number of threads in Java code and Erlang processes so the actor model has no advantage here
Or just that Java is statically typed, while Erlang is not
Something else?
If that's because these are very specific math algorithms, can anybody show more real/practice performance tests?
UPDATE: I've got the answers so far summarizing that Erlang is not the right tool for such specific "fast Java case", but the thing that is unclear to me - what's the main reason for such Erlang inefficiency here: dynamic typing, GC or poor native compiling?

Erlang was not built for math. It was built with communication, parallel processing and scalability in mind, so testing it for math tasks is a bit like testing if your jackhammer gives you refreshing massage experience.
That said, let's offtop a little:
If you want Erlang-style programming in JVM, take a look at Scala Actors or Akka framework or Vert.x.

Benchmarks are never good for saying anything else than what they are really testing. If you feel that a benchmark is only testing primitives and a classic threading model, that is what you get knowledge about. You can now with some confidence say that Java is faster than Erlang on mathematics on primitives as well as the classic threading model for those types of problems. You don't know anything about the performance with large number of threads or for more involved problems because the benchmark didn't test that.
If you are doing the types of math that the benchmark tested, go with Java because it is obviously the right tool for that job. If you want to do something heavily scalable with little to no shared state, find a benchmark for that or at least re-evaluate Erlang.
If you really need to do heavy math in Erlang, consider using HiPE (consider it anyway for that matter).

I took interest to this as some of the benchmarks are a perfect fit for erlang, such as gene sequencing. So on http://benchmarksgame.alioth.debian.org/ the first thing I did was look at reverse-complement implementations, for both C and Erlang, as well as the testing details. I found that the test is biased because it does not discount the time it takes erlang to start the VM /w the schedulers, natively compiled C is started much faster. The way those benchmarks measure is basically:
time erl -noshell -s revcomp5 main < revcomp-input.txt
Now the benchmark says Java took 1.4 seconds and erlang /w HiPE took 11. Running the (Single threaded) Erlang code took me 0.15 seconds, and if you discount the time it took to start the vm, the actual workload took only 3000 microseconds (0.003 seconds).
So I have no idea how that is benchmarked, if its done 100 times then it makes no sense as the cost of starting the erlang VM will be x100. If the input is a lot longer than given, it would make sense, but I see no details on the webpage of that. To make the benchmarks more fair for Managed languages, have the code (Erlang/Java) send a Unix signal to the python (that is doing the benchmarking) that it hit the startup function.
Now benchmark aside, the erlang VM essentially just executes machine code at the end, as well as the Java VM. So there is no way a math operation would take longer in Erlang than in Java.
What Erlang is bad at is data that needs to mutate often. Such as a chained block cypher. Say you have the chars "0123456789", now your encryption xors the first 2 chars by 7, then xors the next two chars by the result of the first two added, then xors the previous 2 chars by the results of the current 2 subtracted, then xors the next 4 chars.. etc
Because objects in Erlang are immutable this means that the entire char array needs to be copied each time you mutate it. That is why erlang has support for things called NIFS which is C code you can call into to solve this exact problem. In fact all the encryption (ssl,aes,blowfish..) and compression (zlib,..) that ship with Erlang are implemented in C, also there is near 0 cost associated with calling C from Erlang.
So using Erlang you get the best of both worlds, you get the speed of C with the parallelism of Erlang.
If I were to implement the reverse-complement in the FASTEST way possible, I would write the mutating code using C but the parallel code using Erlang. Assuming infinite input, I would have Erlang split on the >
<<Line/binary, ">", Rest/binary>> = read_stream
Dispatch the block to the first available scheduler via round robin, consisting of infinite EC2 private networked hidden nodes, being added in real time to the cluster every millisecond.
Those nodes then call out to C via NIFS for processing (C was the fastest implementation for reverse-compliment on alioth website), then send the output back to the node master to send out to the inputer.
To implement all this in Erlang I would have to write code as if I was writing a single threaded program, it would take me under a day to create this code.
To implement this in Java, I would have to write the single threaded code, I would have to take the performance hit of calling from Managed to Unmanaged (as we will be using the C implementation for the grunt work obviously), then rewrite to support 64 cores. Then rewrite it to support multiple CPUS. Then rewrite it again to support clustering. Then rewrite it again to fix memory issues.
And that is Erlang in a nutshell.

As pointed in other answers - Erlang is designed to solve effectively real life problems, which are bit opposite to benchmark problems.
But I'd like to enlighten one more aspect - pithiness of erlang code (in some cases means rapidness of development), which could be easily concluded, after comparing benchmarks implementations.
For example, k-nucleotide benchmark:
Erlang version: http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotide&lang=hipe&id=3
Java version: http://benchmarksgame.alioth.debian.org/u64q/program.php?test=knucleotide&lang=java&id=3
If you want more real-life benchmarks, I'd suggest you Comparing C++ And Erlang For Motorola Telecoms Software

The Erlang solution uses ETS, Erlang Term Storage, which is like an in-memory database running in a separate process. Consequent to it being in a separate process, all messages to and from that process must be serialized/deserialized. This would account for a lot of the slowness, I should think. For example, if you look at the "regex-dna" benchmark, Erlang is only slightly slower than Java there, and it doesn't use ETS.

The fact that erlang has to allocate memory for every value whereas in java you will typically reuse variables if you want it to be fast, means it will always be faster for 'tight loop' bench marks.
It would be interesting to benchmark a java version using the -client flag and boxed primitives and compare that to erlang.
I believe using hipe is unfair since it is not an active project. I would be interested to know if any mission critical software is running on this.

I don't know anything about Erlang, but this seems to be a compare apples to oranges approach anyways. You must be aware that considerable effort was spent over more than a decade to improve java preformance to the point where it is today.
Its not surprising (to me) that a language implementation done by volunteers or a small company can not outmatch that effort.

Why Python is not better in multiprocessing or multithreading applications than Java?

Since Python has some issues with GIL, Java is better for developing multiprocessing applications. Could you please justify the exact reasoning of java's effective processing than python in your way?

The biggest problem in multithreading in CPython is the Global Interpreter Lock (GIL) (note that other Python implementations don't necessarily share this problem!)
The GIL is an implementation detail that effectively prevents parallel (simultaneous) execution of separate threads in Python. The problem is that whenever Python byte code is to be executed, then the current thread must have acquired the GIL and only a single thread can have the GIL at any given moment.
So if 5 threads are trying to execute some Python byte code, then they will effectively run interleaved, because each one will have to wait for the GIL to become available. This is not usually a problem with single-core computers, as the physical constraints have the same effect: only a single thread can run at a time.
In multi-core/SMP computers, however this becomes a bottleneck. These days almost everthing is running on multiple cores, including effectively all smartphones and even many embedded systems.
Java has no such restrictions, so multiple threads can execute at the exact same time.

I would disagree that Python is not better than Java for Multi-Processing application.
First, I am assuming that the OP is using 'better' to mean 'faster code execution' as far as I can tell.
I suffer from 'speed-freak' syndrome, probably from having come from a C/ASM background, so I have spent considerable time getting to the bottom of the "is Python slow?" issue.
The simple answer to that? "It can be." Here's some important points:
1) With a multi-threadded application, Python is going to have a disadvantage to any language that doesn't have something similar to the GIL. The GIL is an artifact of the Python VM in CPython, not the Python language itself. Some Python VM's like Jython, IronPython, etc do not have a GIL.
2) In a Multi-Process application, the GIL doesn't really apply, and thus you can now start to harness faster execution of your Python code unmolested for the most part by the GIL. I strongly suggest if you want to write large Python code that needs both speed and concurrency, that you learn about Multi-Processing, and possibly ZMQ/0MQ for message passing.
3) Regardless of the GIL, Java displays faster code execution than Python in many areas. This is due to native differences in how Python handles objects in memory:
A number of Python functions create copies of objects in memory rather than modifying them ( see http://www.skymind.com/~ocrow/python_string/ for examples)
Python uses Dict to store attributes for objects, etc. I don't want to distract and delve into these areas, but I can generally say that some of the 'neat' things that Python can do come at a speed cost. It's also important to know that there are ways around the default behaviour if that is causing too high of a speed penalty for you.
4) Some of Java's speed advantage is due to more optimization in the Java VM over Python as far as I can tell. Once you eliminate the differences in how much behind-the-scenes memory/object work is done, Java can often still beat Python. Is it because Java has had more attention than Python? I'm not sure, with enough funding I feel that CPython could be faster.
Check http://c2.com/cgi/wiki?PythonProblems for more discussion on some of these issues.
I will say that I have decided to embrace Python nearly 100% going forward with new code.
Don't fall into the premature optimization trap, and remember you can always call C code in a pinch. Make your code work well, make it maintainable, then start to optimize once the speed of the application isn't fast enough for your needs.
Interesting Benchmarks:
http://benchmarksgame.alioth.debian.org/u64/python.php
Further information about Python speed issues can be found here:
http://www.infoworld.com/d/application-development/van-rossum-python-not-too-slow-188715

Which programming language for compute-intensive trading portfolio simulation?

I am building a trading portfolio management system that is responsible for production, optimization, and simulation of non-high frequency trading portfolios (dealing with 1min or 3min bars of data, not tick data).
I plan on employing Amazon web services to take on the entire load of the application.
I have four choices that I am considering as language.
Java
C++
C#
Python
Here is the scope of the extremes of the project scope. This isn't how it will be, maybe ever, but it's within the scope of the requirements:
Weekly simulation of 10,000,000 trading systems.
(Each trading system is expected to have its own data mining methods, including feature selection algorithms which are extremely computationally-expensive. Imagine 500-5000 features using wrappers. These are not run often by any means, but it's still a consideration)
Real-time production of portfolio w/ 100,000 trading strategies
Taking in 1 min or 3 min data from every stock/futures market around the globe (approx 100,000)
Portfolio optimization of portfolios with up to 100,000 strategies. (rather intensive algorithm)
Speed is a concern, but I believe that Java can handle the load.
I just want to make sure that Java CAN handle the above requirements comfortably. I don't want to do the project in C++, but I will if it's required.
The reason C# is on there is because I thought it was a good alternative to Java, even though I don't like Windows at all and would prefer Java if all things are the same.
Python - I've read somethings on PyPy and pyscho that claim python can be optimized with JIT compiling to run at near C-like speeds... That's pretty much the only reason it is on this list, besides that fact that Python is a great language and would probably be the most enjoyable language to code in, which is not a factor at all for this project, but a perk.
To sum up:
real time production
weekly simulations of a large number of systems
weekly/monthly optimizations of portfolios
large numbers of connections to collect data from
There is no dealing with millisecond or even second based trades. The only consideration is if Java can possibly deal with this kind of load when spread out of a necessary amount of EC2 servers.
Thank you guys so much for your wisdom.

Pick the language you are most familiar with. If you know them all equally and speed is a real concern, pick C.

While I am a huge fan of Python and personaly I'm not a great lover of Java, in this case I have to concede that Java is the right way to go.
For many projects Python's performance just isn't a problem, but in your case even minor performance penalties will add up extremely quickly. I know this isn't a real-time simulation, but even for batch processing it's still a factor to take into consideration. If it turns out the load is too big for one virtual server, an implementation that's twice as fast will halve your virtual server costs.
For many projects I'd also argue that Python will allow you to develop a solution faster, but here I'm not sure that would be the case. Java has world-class development tools and top-drawer enterprise grade frameworks for parallell processing and cross-server deployment and while Python has solutions in this area, Java clearly has the edge. You also have architectural options with Java that Python can't match, such as Javaspaces.
I would argue that C and C++ impose too much of a development overhead for a project like this. They're viable inthat if you are very familiar with those languages I'm sure it would be doable, but other than the potential for higher performance, they have nothing else to bring to the table.
C# is just a rewrite of Java. That's not a bad thing if you're a Windows developer and if you prefer Windows I'd use C# rather than Java, but if you don't care about Windows there's no reason to care about C#.

I would pick Java for this task. In terms of RAM, the difference between Java and C++ is that in Java, each Object has an overhead of 8 Bytes (using the Sun 32-bit JVM or the Sun 64-bit JVM with compressed pointers). So if you have millions of objects flying around, this can make a difference. In terms of speed, Java and C++ are almost equal at that scale.
So the more important thing for me is the development time. If you make a mistake in C++, you get a segmentation fault (and sometimes you don't even get that), while in Java you get a nice Exception with a stack trace. I have always preferred this.
In C++ you can have collections of primitive types, which Java hasn't. You would have to use external libraries to get them.
If you have real-time requirements, the Java garbage collector may be a nuisance, since it takes some minutes to collect a 20 GB heap, even on machines with 24 cores. But if you don't create too many temporary objects during runtime, that should be fine, too. It's just that your program can make that garbage collection pause whenever you don't expect it.

Why only one language for your system? If I were you, I will build the entire system in Python, but C or C++ will be used for performance-critical components. In this way, you will have a very flexible and extendable system with fast-enough performance. You can find even tools to generate wrappers automatically (e.g. SWIG, Cython). Python and C/C++/Java/Fortran are not competing each other; they are complementing.

Write it in your preferred language. To me that sounds like python. When you start running the system you can profile it and see where the bottlenecks are. Once you do some basic optimisations if it's still not acceptable you can rewrite portions in C.
A consideration could be writing this in iron python to take advantage of the clr and dlr in .net. Then you can leverage .net 4 and parallel extensions. If anything will give you performance increases it'll be some flavour of threading which .net does extremely well.
Edit:
Just wanted to make this part clear. From the description, it sounds like parallel processing / multithreading is where the majority of the performance gains are going to come from.

It is useful to look at the inner loop of your numerical code. After all you will spend most of your CPU-time inside this loop.
If the inner loop is a matrix operation, then I suggest python and scipy, but of the inner loop if not a matrix operation, then I would worry about python being slow. (Or maybe I would wrap c++ in python using swig or boost::python)
The benefit of python is that it is easy to debug, and you save a lot of time by not having to compile all the time. This is especially useful for a project where you spend a lot of time programming deep internals.

I would go with pypy. If not, http://lolcode.com/.

Java performance in numerical algorithms

I am curious about performance of Java numerical algorithms, say for example matrix matrix double precision multiplication, using the latest JIT machines as compared for example to hand tuned SSE C++/assembler or Fortran counterparts.
I have looked on the web but most of the results come from almost 10 years ago and I understand Java progressed quite a lot since then.
If you have experience using Java for numerically intensive applications can you share your experience. Also how well does Java perform in kernels where the loops are relatively short and the memory access is not very uniform but still within the limits of L1 cache? If such kernel is executed multiple times in succession, can JVM optimize it during runtime?
Thanks

I have written some reasonably large and performance sensitive numerical code in Java (crunching large arrays of doubles usually).
I've found Java to be "good enough" for fast numerical calculations. Especially when you consider that you are usually not CPU-bound anyway - memory latency and cache awareness will probably be your biggest problem for large datasets.
However, you can still beat Java with hand-optimized C/C++ code that takes advantage of specific vectorised instructions etc. or highly customised memory layouts. So for the very fastest code, you could consider writing the core algorithm in C/C++ and calling it from Java using JNI.
Personally, I find that creating a native code dependency is usually more trouble than it is worth so I tend to stick to the pure Java approach.

This is coming from a .NET side of things, but I'm 90% sure that it's the case for Java too. While the JIT will make some use of SSE instructions where it can, it currently does not auto-vectorize your code when dealing with, for example, matrix multiplications. Hand vectorized C++ using compiler intrinsics/inline assembly will definitely be faster here.

One of the weakest points in java is (native) matrix operations. This is due to the nature of Java matrices:
You can not declare a matrix to be rectangular, ie. each row can have a different number of columns.
A matrix is technically not a "matrix of doubles (or ints, ...)", but an array of arrays of ... . The big difference is that since arrays are Java objects you can assign the same array object to more than 1 row.
These two properties make a lot of standard matrix optimizations impossible for the compiler.
You might get better performance by using a Java library which emulates matrices on a single long array. However you have the overhead of method calls for all access.

C++ will definitely be faster. You can even have some hand-optimized libraries for your purposes that contain assembly codes for each of the major CPUs out there. You can't get better than that.
Afterwards, you can use JNI to call to it from Java, if needed.
Java is not meant for high performance arithmetic calculations like this. If you are depending on these, I'd recommend picking a proper, low-level language to implement that. Or, alternatively, you can write the performance-specific part in a low level language, and then connect it to a Java front-end using JNI or some other IPC method.

This is a link to the programming language shootout page for java vs c++, which will give you a comparison of java's speed on several compute intensive algorithms. It will also show you what highest performance java code looks like. For the most part, for these few specific benchmarks, java took more time (but not more than 2 or 3 times) to run.

Seconding that your best bet is to test it for yourself, as performance will vary somewhat depending on what you're doing exactly. I find it difficult to believe Shane C. Mason's answer that Java performance will be the same as C++ or Fortran performance, as even C++ and Fortran are not really comparable for some scientific computing algorithms.
I have a computational fluid dynamics code that I wrote using C++ and the same code essentially translated into Fortran. I'm not really sure why yet, but the Fortran version is about twice as fast as the C++ version. I would guess that with features like bounds-checking and garbage collection, Java would be slower than both, but I would not know until I tested.

This can be so dependent on what you are doing in the C++ code.
For example, are you using the GPU? Edit I forgot about jogl, so Java can compete here.
Are you parallelized using STM or shared-memory, then Java can't compete.
For a link on analysis of parallel matrix multiplication: http://www.cs.utexas.edu/users/plapack/papers/ipps98/ipps98.html
Do you have enough memory to do the calculations in memory, so the garbage collector won't be needed, and have you fine-tuned the garbage collector for optimal performance? Then, Java can be competitive, perhaps.
Are you using multicores, and is the C++ optimized to take advantage of this architecture? Then Java won't be able to compete.
Are you using several computers tied together, then Java won't be able to compete.
Are you using any combination of these, then it will depend on the particular implementation.
Java is not designed to compete with a hand-tuned C++ program, but, the time it takes to do the tuning, are you doing enough calculations where it will matter? Java will be able to give some reasonable speed but with less work than hand-tuning, but not much of an improvement over just doing C++ code.
You may want to see if there is an improvement over Haskell or Erlang, for example, over your C++, as these languages are better designed for this type of work.

Are these kind of computations you're interested in - Fast Fourier Transform, Jacobi Successive Over Relaxation, Monte Carlo Integration, Sparse Matrix Mult, Dense LU Matrix Factorisation?
They make up the SciMark 2.0 composite benchmark which you can launch as an applet on your machine.
There are also ANSI C versions of the programs, and an Intel document (pdf) on optimizing and recompiling SciMark for C++.
Similarly you could use The Java Grande Forum Benchmark Suite and the comparison C programs.

Java uses a Just in Time (JIT) compiler to convert the bytecode to native machine language - so the first time it runs through a code block it will be slower but once the segment is 'warmed up' the performance will be equivalent. In short - the numerical performance is pretty good.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.