OpenCV (JavaCV) vs OpenCV (C/C++ interfaces) - java

I am just wondering whether there would be a significant speed performance advantage relatively on a given set of machines when using JavaCV as opposed to the C/C++ implementation of OpenCV.
Please correct me if I am wrong, but my understanding is that the c/c++ implementation of opencv is closer to the machine where as the Java implementation of OpenCV, JavaC, would have a slight speed performance disadvantage (in milliseconds) as there would be a virtual machine converting your source code to bytecode which then gets converted to machine code. Whereas, with c/c++, it gets converted straight to machine code and thus doesn't carry that intermediary step of the virtual machine overhead.
Please don't kill me here if I made mistakes; I am just learning and would welcome constructive criticism.
Thank you

I'd like to add a couple of things to #ejbs's answer.
First of all, you concerned 2 separate issues:
Java vs. C++ performance
OpenCV vs JavaCV
Java vs. C++ performance is a long, long story. On one hand, C++ programs are compiled to a highly optimized native code. They start quickly and run fast all the time without pausing for garbage collection or other VM duties (as Java do). On other hand, once compiled, program in C++ can't change, no matter on what machine they are run, while Java bytecode is compiled "just-in-time" and is always optimized for processor architecture they run on. In modern world, with so many different devices (and processor architectures) this may be really significant. Moreover, some JVMs (e.g. Oracle Hotspot) can optimize even the code that is already compiled to native code! VM collect data about program execution and from time to time tries to rewrite code in such a way that it is optimized for this specific execution. So in such complicated circumstances the only real way to compare performance of implementations in different programming languages is to just run them and see the result.
OpenCV vs. JavaCV is another story. First you need to understand stack of technologies behind these libraries.
OpenCV was originally created in 1999 in Intel research labs and was written in C. Since that time, it changed the maintainer several times, became open source and reached 3rd version (upcoming release). At the moment, core of the library is written in C++ with popular interface in Python and a number of wrappers in other programming languages.
JavaCV is one of such wrappers. So in most cases when you run program with JavaCV you actually use OpenCV too, just call it via another interface. But JavaCV provides more than just one-to-one wrapper around OpenCV. In fact, it bundles the whole number of image processing libraries, including FFmpeg, OpenKinect and others. (Note, that in C++ you can bind these libraries too).
So, in general it doesn't matter what you are using - OpenCV or JavaCV, you will get just about same performance. It more depends on your main task - is it Java or C++ which is better suited for your needs.
There's one more important point about performance. Using OpenCV (directly or via wrapper) you will sometimes find that OpenCV functions overcome other implementations by several orders. This is because of heavy use of low-level optimizations in its core. For example, OpenCV's filter2D function is SIMD-accelerated and thus can process several sets of data in parallel. And when it comes to computer vision, such optimizations of common functions may easily lead to significant speedup.

JavaCV interfaces to OpenCV, so when you call something OpenCV related there would be some overhead but in general most of the heavy work will still be on the C++ side and therefore there won't be a very large performance penalty.
You would have to do performance benchmarks to find out more.
PS. I'm pretty new here but I'm rather sure that this is not a suitable question for StackOverflow.

i would like to add a few more insights on java as an interface to c++ libraries...
A)developing:
1)while java may be easier to manage large scale projects and compiles extremely fast, it is very very hard, and next to impossible to debug native code from java...
when code crush on native side...or memory leaks( something that happens a lot... ) you feel kind of helpless...
2)unless you build the bindings yourself( not an easy task even with using swig or whatever... ) you are dependent on the good will/health/time of the bindings builder....
so in this case i would prefer the official "desktop java " bindings over javacv...
B)performance.
1) while bindings may be optimized( memory transfer using neobuffer ) as in the javacv case there is still a very very small jni overhead for each native function call -
this is meaningless in our case since most opencv functions consume X100000++ cpu cycles compared to this jni overhead...
2) The BIG-PROBLEM ---- stop the world GARBAGE COLLECTIOR( GC )
java uses a garbage collector that halts all cpu's threads making it UNSUITABLE for REAL-TIME application's , there are work-around's iv'e heard of like redesigning your app not to produce garbage, use a spaciel gc, or use realtime java( cost money... ) they all seem to be extra-work( and all you wanted is a nice easy path to opencv.... )
conclusion - if you want to create a professional real time app - then go with c++
unless you have a huge modular project to manage - just stick with c++ and precompiled headers( make things compile faster... )
while java is a pleasure to work with , when it comes to native binding's HELL breaks loose...i know iv'e been there....

Related

Decoding Airplay Packets in Java or C/C++ on Android

I'm currently working on an AirPlay receiver for a subpart of an android application. I am using the following framework:
https://github.com/pentateu/DroidAirPlay
While this works great on some mid range devices such as the miPad, we need to get this running on a low spec custom device. The custom device is decoding the airplay packets at 10x to 20x slower than the miPad. As a result, the audio packets lose time synchronisation and due to the time taken to decode the packets, the audio can never re-sync.
I was looking into some other airplay receiver apps on the Play Store and from what I can see they tend to be based on Shairport (https://github.com/abrasive/shairport) for the airplay receiver side of things.
**Note: ** the Shairport based frameworks do not seem to suffer the synchronisation issue on the low end device.
The framework I am using is heavily based on the Shairport framework apart from that it is written in Java.
For decoding data, is C/C++ far superior than Java?
If so, would directing the decoding part of the DroidAirPlay framework via a C or C++ implementation using the NDK give me a large boost to performance?
Thanks in advance
Matt
While it is true that Java compiles to bytecode that runs in a virtual machine, it may not necessarily be slower (or faster) than a natively compiled executable, whether C/C++ or not. It all depends on the program!
There's a number of reasons why in this case Java may be slower:
The decoding implementation might just be poorly coded/optimized? (which isn't really Java's fault)
The Java compiler may generate sub-optimal code for the JVM.
Some of Java's language constructs are just too slow for the speed/resource demands placed on it here.
The JVM just is another abstraction layer and the culprit
The garbage collection is in on it?!
(I have to note here that I'm not an expert on Java!)
However, I still wouldn't go so far as to call Java intrinsically slower than C or C++. I'm sure you can find many-a benchmarks and tests on the internet comparing one language to another, and some make claims to a certain degree (out of pride and ego?). But those tests are only specific cases, usually testing specific aspects of a larger language (hash-map lookup performance for instance!).
LLVM had a three part blog post on C and why undefined behavior allows the compiler to generate still correct but more efficient code at the cost of inferring run-time safety checking or deciding that i+1 always comes after i, totally ignoring the existence of integer overflow. If the programmer is not careful, this can have disastrous consequences.
In the words of Bjarne himself in Abstraction and the C++ machine model:
C++ was designed to be a systems programming language and has been used for
embedded systems programming and other resource-constrained types of
programming since the earliest days.
As such, I believe C and C++ can be pushed further than Java because of this undefined behavior and less restrictions placed on it. (And there's also the inline assembly bit, but that's not strictly C!)

Why operating systems are not written in java?

All the operating systems till date have been written in C/C++ while there is none in Java. There are tonnes of Java applications but not an OS. Why?
Because we have operating systems already, mainly. Java isn't designed to run on bare metal, but that's not as big of a hurdle as it might seem at first. As C compilers provide intrinsic functions that compile to specific instructions, a Java compiler (or JIT, the distinction isn't meaningful in this context) could do the same thing. Handling the interaction of GC and the memory manager would be somewhat tricky also. But it could be done. The result is a kernel that's 95% Java and ready to run jars. What's next?
Now it's time to write an operating system. Device drivers, a filesystem, a network stack, all the other components that make it possible to do things with a computer. The Java standard library normally leans heavily on system calls to do the heavy lifting, both because it has to and because running a computer is a pain in the ass. Writing a file, for example, involves the following layers (at least, I'm not an OS guy so I've surely missed stuff):
The filesystem, which has to find space for the file, update its directory structure, handle journaling, and finally decide what disk blocks need to be written and in what order.
The block layer, which has to schedule concurrent writes and reads to maximize throughput while maximizing fairness.
The device driver, which has to keep the device happy and poke it in the right places to make things happen. And of course every device is broken in its own special way, requiring its own driver.
And all this has to work fine and remain performant with a dozen threads accessing the disk, because a disk is essentially an enormous pile of shared mutable state.
At the end, you've got Linux, except it doesn't work as well because it doesn't have near as much effort invested into functionality and performance, and it only runs Java. Possibly you gain performance from having a single address space and no kernel/userspace distinction, but the gain isn't worth the effort involved.
There is one place where a language-specific OS makes sense: VMs. Let the underlying OS handle the hard parts of running a computer, and the tenant OS handles turning a VM into an execution environment. BareMetal and MirageOS follow this model. Why would you bother doing this instead of using Docker? That's a good question.
Indeed there is a JavaOS http://en.wikipedia.org/wiki/JavaOS
And here is discuss about why there is not many OS written in java Is it possible to make an operating system using java?
In short, Java need to run on JVM. JVM need to run on an OS. writing an OS using Java is not a good choice.
OS needs to deal with hardware which is not doable using java (except using JNI). And that is because JVM only provided limited commands which can be used in Java. These command including add, call a method and so on. But deal with hardware need command to operate reg, memory, CPU, hardware drivers directly. These are not supported directly in JVM so JNI is needed. That is back to the start - it is still needed to write an OS using C/assembly.
Hope this helps.
One of the main benefits of using Java is that abstracts away a lot of low level details that you usually don't really need to care about. It's those details which are required when you build an OS. So while you could work around this to write an OS in Java, it would have a lot of limitations, and you'd spend a lot of time fighting with the language and its initial design principles.
For operating systems you need to work really low-level. And that is a pain in Java. You do need e.g. unsigned data types, and Java only has signed data types. You need struct objects that have exactly the memory alignment the driver expects (and no object header like Java adds to every object).
Even key components of Java itself are no longer written in Java.
And this is -by no means- a temporary thing. More and more does get rewritten in native code to get better performance. The HotSpot VM adds "intrinsics" for performance critical native code, and there is work underway to reduce the overall cost of native calls.
For example JavaFX: The reason why it is much faster than AWT/Swing ever were is because it contains/uses a huge amount of native code. It relies on native code for rendering, and e.g. if you add the "webview" browser component it is actually using the webkit C library to provide the browser.
There is a number of things Java does really well. It is a nicely structured language with a fantastic toolchain. Python is much more compact to write, but its toolchain is a mess, e.g. refactoring tools are disappointing. And where Java shines is at optimizing polymorphism at run-time. Where C++ compilers would need to do expensive virtual calls - because at compile time it is not known which implementation will be used - there Hotspot can aggressively inline code to get better performance. But for operating systems, you do not need this much. You can afford to manually optimize call sites and inlining.
This answer does not mean to be exhaustive in any way, but I'd like to share my thoughts on the (very vast) topic.
Although it is theoretically possible to write some OS in pure java, there are practical matters that make this task really difficult. The main problem is that there is no (currently up to date and reliable) java compiler able to compile java to byte code. So there is no existing tool to make writing a whole OS from the ground up feasible in java, at least as far as my knowledge goes.
Java was designed to run in some implementation of the java virtual machine. There exist implementations for Windows, Mac, Linux, Android, etc. The design of the language is strongly based on the assumption that the JVM exists and will do some magic for you at runtime (think garbage collection, JIT compiler, reflection, etc.). This is most likely part of the reason why such a compiler does not exist: where would all these functionality go? Compiled down to byte code? It's possible but at this point I believe it would be difficult to do. Even Android, whose SDK is purely java based, runs Dalvik (a version of the JVM that supports a subset of the language) on a Linux Kernel.

MonoTouch performance vs. Objective-C/Java

I'm in the process of developing a multiplatform game engine and am using MonoTouch to cover Android and iPhone. I'm really interested in the performance aspect of using MonoTouch for iOS and Android development, does anyone know what, if any, performance impact MonoTouch will have over developing using Java or Objective-C for their relevant platforms? I ask this specificlaly from a game developers perspective, so things like drawing code and such really worry me. From what I've seen mono apps run fine, but say you made a game in the level of Angry Birds (art work, sound, physics processing), would that run well enough through mono that you wouldn't be put at a significant disadvantage over using the native language of the platform?
First, a clarification: on Android, the code is not executed by Java runtime, but by Dalvik (written from scratch by Google). Thus, the Java VM performance is of no relevance to this question.
With this in mind: most programs on Android don't execute native code, but run on Dalvik VM (which runs the translated Java bytecode). The Mono JIT has been benchmarked against it before and was consistently been found faster (check for example http://www.koushikdutta.com/2009/01/dalvik-vs-mono.html ).
On iOS, MonoTouch has to pre-compile the code into a native application before it can be installed on an Apple device (because of license restrictions, which are enforced by the Operating System). That said, both Objective C compiler and Mono's Ahead Of Time Compilation use the same LLVM backend for generating and optimizing the binary code, so the results you will get should be almost identical (with some memory overhead for Mono).
Please remember one important quote from Donald Knuth: "Premature optimization is the root of all evil." Write your code with performance in mind, but remember that maintainability is more important. Optimization should be done only when it's necessary (because usually the compiler will do a much better job than you can).

Which programming language for compute-intensive trading portfolio simulation?

I am building a trading portfolio management system that is responsible for production, optimization, and simulation of non-high frequency trading portfolios (dealing with 1min or 3min bars of data, not tick data).
I plan on employing Amazon web services to take on the entire load of the application.
I have four choices that I am considering as language.
Java
C++
C#
Python
Here is the scope of the extremes of the project scope. This isn't how it will be, maybe ever, but it's within the scope of the requirements:
Weekly simulation of 10,000,000 trading systems.
(Each trading system is expected to have its own data mining methods, including feature selection algorithms which are extremely computationally-expensive. Imagine 500-5000 features using wrappers. These are not run often by any means, but it's still a consideration)
Real-time production of portfolio w/ 100,000 trading strategies
Taking in 1 min or 3 min data from every stock/futures market around the globe (approx 100,000)
Portfolio optimization of portfolios with up to 100,000 strategies. (rather intensive algorithm)
Speed is a concern, but I believe that Java can handle the load.
I just want to make sure that Java CAN handle the above requirements comfortably. I don't want to do the project in C++, but I will if it's required.
The reason C# is on there is because I thought it was a good alternative to Java, even though I don't like Windows at all and would prefer Java if all things are the same.
Python - I've read somethings on PyPy and pyscho that claim python can be optimized with JIT compiling to run at near C-like speeds... That's pretty much the only reason it is on this list, besides that fact that Python is a great language and would probably be the most enjoyable language to code in, which is not a factor at all for this project, but a perk.
To sum up:
real time production
weekly simulations of a large number of systems
weekly/monthly optimizations of portfolios
large numbers of connections to collect data from
There is no dealing with millisecond or even second based trades. The only consideration is if Java can possibly deal with this kind of load when spread out of a necessary amount of EC2 servers.
Thank you guys so much for your wisdom.
Pick the language you are most familiar with. If you know them all equally and speed is a real concern, pick C.
While I am a huge fan of Python and personaly I'm not a great lover of Java, in this case I have to concede that Java is the right way to go.
For many projects Python's performance just isn't a problem, but in your case even minor performance penalties will add up extremely quickly. I know this isn't a real-time simulation, but even for batch processing it's still a factor to take into consideration. If it turns out the load is too big for one virtual server, an implementation that's twice as fast will halve your virtual server costs.
For many projects I'd also argue that Python will allow you to develop a solution faster, but here I'm not sure that would be the case. Java has world-class development tools and top-drawer enterprise grade frameworks for parallell processing and cross-server deployment and while Python has solutions in this area, Java clearly has the edge. You also have architectural options with Java that Python can't match, such as Javaspaces.
I would argue that C and C++ impose too much of a development overhead for a project like this. They're viable inthat if you are very familiar with those languages I'm sure it would be doable, but other than the potential for higher performance, they have nothing else to bring to the table.
C# is just a rewrite of Java. That's not a bad thing if you're a Windows developer and if you prefer Windows I'd use C# rather than Java, but if you don't care about Windows there's no reason to care about C#.
I would pick Java for this task. In terms of RAM, the difference between Java and C++ is that in Java, each Object has an overhead of 8 Bytes (using the Sun 32-bit JVM or the Sun 64-bit JVM with compressed pointers). So if you have millions of objects flying around, this can make a difference. In terms of speed, Java and C++ are almost equal at that scale.
So the more important thing for me is the development time. If you make a mistake in C++, you get a segmentation fault (and sometimes you don't even get that), while in Java you get a nice Exception with a stack trace. I have always preferred this.
In C++ you can have collections of primitive types, which Java hasn't. You would have to use external libraries to get them.
If you have real-time requirements, the Java garbage collector may be a nuisance, since it takes some minutes to collect a 20 GB heap, even on machines with 24 cores. But if you don't create too many temporary objects during runtime, that should be fine, too. It's just that your program can make that garbage collection pause whenever you don't expect it.
Why only one language for your system? If I were you, I will build the entire system in Python, but C or C++ will be used for performance-critical components. In this way, you will have a very flexible and extendable system with fast-enough performance. You can find even tools to generate wrappers automatically (e.g. SWIG, Cython). Python and C/C++/Java/Fortran are not competing each other; they are complementing.
Write it in your preferred language. To me that sounds like python. When you start running the system you can profile it and see where the bottlenecks are. Once you do some basic optimisations if it's still not acceptable you can rewrite portions in C.
A consideration could be writing this in iron python to take advantage of the clr and dlr in .net. Then you can leverage .net 4 and parallel extensions. If anything will give you performance increases it'll be some flavour of threading which .net does extremely well.
Edit:
Just wanted to make this part clear. From the description, it sounds like parallel processing / multithreading is where the majority of the performance gains are going to come from.
It is useful to look at the inner loop of your numerical code. After all you will spend most of your CPU-time inside this loop.
If the inner loop is a matrix operation, then I suggest python and scipy, but of the inner loop if not a matrix operation, then I would worry about python being slow. (Or maybe I would wrap c++ in python using swig or boost::python)
The benefit of python is that it is easy to debug, and you save a lot of time by not having to compile all the time. This is especially useful for a project where you spend a lot of time programming deep internals.
I would go with pypy. If not, http://lolcode.com/.

Java vs Python on Hadoop

I am working on a project using Hadoop and it seems to natively incorporate Java and provide streaming support for Python. Is there is a significant performance impact to choosing one over the other? I am early enough in the process where I can go either way if there is a significant performance difference one way or the other.
With Python you'll probably develop faster and with Java will definitely run faster.
Google "benchmarksgame" if you want to see some very accurate speed comparisons between all popular languages, but if I recall correctly you're talking about 3-5x faster.
That said, few things are processor bound these days, so if you feel like you'd develop better with Python, have at it!
In reply to comment (how can java be faster than Python):
All languages are processed differently. Java is about the fastest after C & C++ (which can be as fast or up to 5x faster than java, but seems to average around 2x faster). The rest are from 2-5+ times slower. Python is one of the faster ones after Java. I'm guessing that C# is about as fast as Java or maybe faster, but the benchmarksgame only had Mono (which was a tad slower) because they don't run it on windows.
Most of these claims are based on the computer language benchmarks game which tends to be pretty fair because advocates of/experts in each language tweak the test written in their specific language to ensure the code is well-targeted.
For example, this shows all tests with Java vs c++ and you can see the speed ranges from about equal to java being 3x slower (first column is between 1 and 3), and java uses much more memory!
Now this page shows java vs python (from the point of view of Python). So the speeds range from python being 2x slower than Java to 174x slower, python generally beats java in code size and memory usage though.
Another interesting point here--tests that allocated a lot of memory, Java actually performed significantly better than Python in memory size as well. I'm pretty sure java usually loses memory because of the overhead of the VM, but once that factors out, java is probably more efficient than most (again, except the C's).
This is Python 3 by the way, the other python platform tested (Just called Python) faired much worse.
If you really wanted to know how it is faster, the VM is amazingly intelligent. It compiles to machine language AFTER running the code, so it knows what the most likely code paths are and optimizes for them. Memory allocation is an art--really useful in an OO language. It can perform some amazing run-time optimizations which no non-VM language can do. It can run in a pretty small memory footprint when forced to, and is a language of choice for embedded devices along with C/C++.
I worked on a Signal Analyzer for Agilent (think expensive o-scope) where nearly the entire thing (aside from the sampling) was done in Java. This includes drawing the screen including the trace (AWT) and interacting with the controls.
Currently I'm working on a project for all future cable boxes. The Guide along with most other apps will be written in Java.
Why wouldn't it be faster than Python?
Java is less dynamic than Python and more effort has been put into its VM, making it a faster language. Python is also held back by its Global Interpreter Lock, meaning it cannot push threads of a single process onto different core.
Whether this makes any significant difference depends on what you intend to do. I suspect both languages will work for you.
You can write Hadoop mapreduce transformations either as "streaming" or as a "custom jar". If you use streaming, you can write your code in any language you like, including Python or C++. Your code will just read from STDIN and output to STDOUT. However, on hadoop versions before 0.21, hadoop streaming used to only stream text - not binary - to your processes. Therefore your files needed to be text files, unless you do some funky encoding transformations yourself. But now it appears a patch has been added that now allows the use of binary formats with hadoop streaming.
If you use a "custom jar" (i.e. you wrote your mapreduce code in Java or Scala using the hadoop libraries), then you will have access to functions that allow you to input and output binary (serialize in binary) from your streaming processes (and save the results to disk). So future runs will be much faster (depending on how much your binary format is smaller than your text format).
So if your hadoop job is going to be I/O bound, then the "custom jar" approach will be faster (since both Java is faster as previous posters have shown and reading from disk will also be faster).
But you have to ask yourself how valuable is your time. I find myself far more productive with python, and writing map-reduce that reads STDIN and writes to STDOUT is really straightforward. So I personally would recommend going the python route - even if you have to figure the binary encoding stuff out yourself. Since hadoop 0.21 handles non-utf8 byte arrays, and since there is a binary (byte array) alternative to use for python (http://dumbotics.com/2009/02/24/hadoop-1722-and-typed-bytes/), which shows the python code only being about 25% slower than the "custom jar" java code, I would definitely go the python route.

Categories

Resources