I looked for fast java linear algebra library. I tried many of them(jblas, ujmp, ejml and others). In terms of performance finally I found more reliable the jeigen (java wrapper for c++ eigen library). But wrapper has no Cholesky decomposition. But original eigen it has. Is there a way to add decomposition to wrapper?
I do not know of a Java wrapper that performs a Cholesky Decomposition. Perhaps another user will post after me who is more knowledgeable specifically about Java and jeigen.
However, I can offer some help on your underlying problem, which is to perform a Cholesky Decomposition; I recently wrote a C++ program that performs a Cholesky Decomposition on a real, symmetric, positive-definite, matrix. The source code is freely available on GitHub:
https://github.com/dcb2015/dpotrf_ak1/blob/master/dpotrf_ak1.cpp
You can:
1) use the complete C++ program as-is in your own C++ compiler, or use the C++ source in your own program, if you know C++.
2) translate the C++ routine into Java for use in your own program. The sub-routine that performs the decomposition is quite small. And Java syntax is very similar to C/C++ syntax, so the translation should not be difficult. (Actually, I usually try to avoid the "++" aspects of the language and code as close to simple "C" as possible.) If writing the Java program is part of your work or assignment, this might be your best option.
3) use a ready-for-use JavaScript version of this program which is available online (see my Profile for the site.) If you do not have to write this program yourself, but only need the results of a decomposition, try it out. It may help you out of a tough spot.
I've been trying to find a method to importing Java-ml into my python project. I have the jar file in the same path as my project.
I want to use it for kmeans clustering, since it allows me to change the distance metric. I am wondering though whether with the implementation that one of you suggest, whether I'll be able to pass a different java class as a parameter for the function?
I tried using:
import sys
sys.path.append(r"C:\Users\X\Desktop\X\javaml-0.1.7\javaml-0.1.7.jar")
import net.sf.javaml as jml
test = jml.clustering.Kmeans()
I considered using jython, however I am unsure of how it works, and it is unclear whether I could continue using idle and whether I would have to reprogram my project.
Lastly I considered using PyJNIus, however it is simply not working.
In short, you can't run Java code natively in a CPython interpreter.
Firstly, Python is just the name of the specification for the language. If you are using the Python supplied by your operating system (or downloaded from the official Python website), then you are using CPython. CPython does not have the ability to interpret Java code.
However, as you mentioned, there is an implementation of Python for the JVM called Jython. Jython is an implementation of Python that operates on the JVM and therefore can interact with Java modules. However, very few people work with Jython and therefore you will be a bit on your own about making everything work properly. You would not need to re-write your vanilla Python code (since Jython can interpret Python 2.x) but not all libraries (such as numpy) will be supported.
Finally, I think you need to better understand the K-Means algorithm, as the algorithm is implicitly defined in terms of the Euclidean distance. Using any other distance metric would no longer be considered K-Means and may affect the convergence of the algorithm. See here for more information.
Again, you can't run Java code natively in a CPython interpreter. Of course there are various third party libraries that will handle marshalling of data between Java and Python. However, I stand by my statement that for this particular use case you are likely better to use a native Python library (something like K-Medoid in Scikit-Learn). Attempting to call through to Java, with all the associated overhead, is overkill for this problem, in my opinion.
To "answer" your question directly, Jython will be your best bet if you simply want to import Java classes. Jython strives very hard to be as compatible with Python 2.x as possible and does a good job. So you won't have to spend too much time rewriting code. Just simply run it with Jython and see what happens, then modify what breaks.
Now for the Python answer :D. You may want to use scikit for a native implementation. It will certainly be faster than running anything in Jython.
Update
I think the Py4J module is what you're looking. It works by running a server in your Java code and the Python code will communicate with the Java server. The only good thing about "Py4J" is that it provides the boiler plate code for you. You can very easily setup your own client/server with no extra modules. However I still don't think it's a superior option compared to Pythons native modules.
References
How to import Java class w/ Jython
Scikit - K-Means
I have several VB programs that I wrote a few years ago in school. Is there any way possible to convert those programs to Java? Or would that it be easier to just rewrite it from scratch? My goal is to create an Android app that combines at least two of the programs into one functional app. This is purely a nonprofit endeavor; I'm a full time firefighter and am looking to put a free tool in the hands of my guys and other firemen who might want to use it.
I've been unable to locate the source code for the programs and have searched for an answer but haven't been able to find a definitive answer as most answers cover the source, not the compiled result. I've downloaded a couple supposed VB decompilers to see the results, but, in order to see the 'full' results, all the ones I've used require purchasing a 'pro' version. I have no problem paying for such a version, but I'd like to know if it's going to work properly before I do.
It would definitely be faster to rewrite them than it would be to devise a way of converting a VB program into Java code. Not only are the languages quite dissimilar, but VB's UI model is nothing like Android's, so it would likely be impossible (or at least impractical) to translate the UI code automatically.
I currently use Matlab as a scientific computing language, but am interested in moving to a more open alternative. Python (+scipy +numpy +matplotlib) seems like the best way to go. My biggest worry about the switch is that Python won't interact as nicely/easily/seamlessly with Java as Matlab does and I often need to use Java APIs. In particular I like that in Matlab:
1) I can instantiate Java objects and access their member variables and methods
2) Java events become "Callbacks" in Matlab
3) Java types get automatically cast to Matlab types (boolean to logical, etc)
As far as I can tell there are 3 options in Python (below). My worry is that each is supported/developed by a very small community of developers (1-3 people in each case as I understand) and that support may not be there forever. Which of the below does the two things Matlab does? Which is most likely to continue to do that for the foreseeable future? It would be a bonus if I could use Java GUIs from Python as well. Did I miss any options?
1) Jython
2) Py4J
3) JPype
I don't think there's an easy way to do this since SciPy and friends don't currently run on Jython. You could run things as client-server or as a subprocess with redirected standard input/output/error.
I've also toyed with XML-RPC in one of my Python projects, I've got a blurb from the docs available that might be helpful. You won't get the best performance in the world going this route but it does have the virtue of being fairly easy to get started.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 8 years ago.
Improve this question
I have been a C++ developer for about 10 years. I need to pick up Java just for Hadoop. I doubt I will be doing any thing else in Java. So, I would like a list of things I would need to pick up. Of course, I would need to learn the core language, but what else?
I did Google around for this and this could be seen as a possible duplicate of "I want to learn Java. Show me how?" but it's not. Java is a huge programming language with lots, of libraries and what I need to learn will depend largely on what I am using Hadoop for. But I suppose it is possible to say something like don't bother learning this. This will be quite useful too.
In my day job, I've just spent some time helping a C++ person to pick up enough Java to use some Java libraries via JNI (Java Native Interface) and then shared memory into their primarily C++ application. Here are some of the key things I noticed:
You cannot manage for anything beyond a toy project without an IDE. The very first thing you should do is download a popular Java IDE (Eclipse is a fine choice, but there are also alternatives including Netbeans and IntelliJ). Do not be tempted to try and manage with vi / emacs and javac / make. You will be living in a cave and not realising it. Once you're up to speed with even basic IDE functions you will be literally dozens of times more poductive than without an IDE.
Learn how to layout a simple project structure and packages. There will be simple walkthroughs of how to do this on the Eclipse site or elsewhere. Never put anything into the default package.
Java has a type system whereby the reference and primitive types are relatively separate for historic / performance reasons.
Java's generics are not the same as C++ templates. Read up on "type erasure".
You may wish to understand how Java's GC works. Just google "mark and sweep" - at first, you can just settle for the naivest mental model and then learn the details of how a modern production GC would do it later.
The core of the Collections API should be learned without delay. Map / HashMap, List / ArrayList & LinkedList and Set should be enough to get going.
Learn modern Java concurrency. Thread is an assembly-language level primitive compared to some of the cool stuff in java.util.concurrent. Learn ConcurrentHashMap, Atomic*, Lock, Condition, CountDownLatch, BlockingQueue and the threadpools from Executors. Good books here are those by Brian Goetz and Doug Lea.
As soon as you want to use 3rd party libraries, you'll need to learn how the classpath works. It's not rocket science, but it is a bit verbose.
If you're a low-level C++ guy, then you may find some of this interesting also:
Java has virtual dispatch by default. The keyword static on a Java method is used to indicate a class method. private Java methods use invokespecial dispatch, which is a dispatch onto the exact type in use.
On an Oracle VM at least, objects comprise two machine words of header (the mark word and the class word). The mark word is a bunch of flags the VM uses - notably for thread synchronization. The class word you can think of as a pointer to the VM's representation of the Class object (which is where the vtables for methods live). Following the class word are the member fields of the instance of the object.
Java .class files are an intermediate language, and not really that similar to x86 object code. In particular there are lots more useful tools for .class files (including the javap disassembler which ships with the JVM)
The Java equivalent of the symbol table is called the Constant Pool. It's typed and it has a lot of information in it - arguably more than the x86 object code equivalent.
Java virtual method dispatch consists of looking up the correct method to be called in the Constant Pool and then converting that to an offset into a vtable. Then walking up the class hierarchy until a not-null value is found at that vtable offset.
Java starts off interpreted and then goes compiled (for Oracle and some other VMs anyway). The switch to compiled mode is done method-by-method on a as-need basis. When benchmarking and perf tuning you need to make sure that you've warmed the system up before you start, and that you should typically profile at the method level to start with. The optimizations that are made can be quite aggressive / optimistic (with a check and a fallback if the assumptions are violated) - so perf tuning is a bit of an art.
Hopefully there's some useful stuff in there to be going on with - please comment / ask followup questions.
Learning "just enough" Java is learning Java. Either you learn all the core principles and language design decisions, or you suffer along making easily avoidable mistakes. Considering that you already know how to program, a lot of the information can be skimmed (with an eye for where it differs from other languages you are intimately familiar).
so you need to learn:
How to get started
The language itself
The core, essential classes
The major Collections
And if you don't have a build framework in place, how to package your compiled code.
Beyond that, nearly every other item you might need to learn depends heavily on what you intend to do. Don't discount the on-line tutorials from Oracle/Sun, they are quite good (compared to other online tutorials).
Hadoop can use C++ : WordCount example in C++
You can't really use Java without knowing these packages in the standard API:
java.lang
java.util
java.io
And, to a lesser degree:
java.text
java.math
java.net
java.lang.reflect
java.util.concurrent
They contain a lot of classes you'll need to use constantly for pretty much any application, and it's a good idea to look through them until you know which classes they contain and what those are good for, lest you end up reinventing wheels.
Take it easy, learning Java could be
pleasant and fast if you already know
C++
Buy these two books:
The JavaTM Programming Language, (4th Edition) Ken Arnold, James
Gosling, Davis Holmes
Effective Java (2nd Edition), Joshua Bosh
You will soon be mastering Java, You will not regret. Good Luck.
Since C++ and Java share common roots, the core language shouldn't give you too much trouble. You will need to become familar with the java SDK, particularly java.lang and the Collections framework (java.util.)
But perhaps learning java is overkill if you don't see yourself using it elsewhere. Hadoop also has bindings to Python - perhaps learning python would be a better alternative? See Java vs Python on Hadoop.
Here is the quickstart for all you will need
I suggest Eclipse (java) to start working, see this for that
Maybe you don't even need to know Java to use Hadoop.
Pig is far enough from simple to advanced usage of Hadoop.
I don't know how familiar are you with other higher level programming languages. Garbage collection is an important function in Java. It would be important to read a bit about the GC in your VM of choice.
Besides the obvious packages, check out the java.util packages for the collection framework. You might want to check out the source of some classes. I suggest HashMap to get the idea of the computing/memory cost of these operations.
Java likes to use streams instead of buffers when processing large amounts of data. That may take some time getting used to.
Java has no unsigned types. Depending on the packets of data you need to process at once you can either use larger variables and streight arythetics (if we're talking about relatively small packets), or you have to (b[i] & 0xff) every time you read for example unsigned bytes. Also note that Java uses network byte order (msbf) when serializing multibyte numbers.
The most beloved design patterns by the API are Singleton, Decorator and Factory. Check the source of JFC itself for best practices, how these patterns are achieved in the language.
... and you can still post more concrete questions on SO :)
Answer 1 :
It is very desirable to know Java. Hadoop is written in Java. Its popular Sequence File format is dependent on Java.
Even if you use Hive or Pig, you'll probably need to write your own UDF someday. Some people still try to write them in other languages, but I guess that Java has more robust and primary support for them.
Most Hadoop tools are not mature enough (like Sqoop, HCatalog and so on), so you'll see many Java error stack traces and probably you'll want to hack the source code someday
Answer 2
It is not required for you to know Java.
As the others said, it would be very helpful depending on how complex your processing may be. However, there is an incredible amount you can do with just Pig and say Hive.
I would agree that it is fairly likely you will eventually need to write a user defined function (UDF), however, I've written those in Python, and it is very easy to write UDFs in Python.
Granted, if you have very stringent performance requirements, then a Java based MapReduce program would be the way to go. However, great advancements in performance are being made all of the time in both Pig and Hive.
So, the short answer to your question is, "No", it is not required for you to know Java in order to perform Hadoop development.
Source :
http://www.linkedin.com/groups/Is-it-must-Hadoop-Developer-988957.S.141072851
Most of the stuff should be pretty familiar to you. I'd just download eclipse and google a tutorial site. Familiarize yourself with classloading, keywords. One tricky thing a lot of C++ guys run into is how to run a java app so that it finds its library classes(sort of analogous to dynamic linking). Learn the difference between the JRE and JDK. If you can get a few hello world type apps working you ought to be able to get a start on hadoop if you follow the tutorials.
You dont need to learn java to use hadoop.
You need to know linux to installand configure hadoop
then you can write your map reduce jobs using the stream line api on any language which understand standard input/output
further you can do more complex map reduce using other libraries like hive etc
even other components of hadoop like hbase/ cassandra also has clients on most of the languages