Questions on Java and Scala and parallelization

Questions on Java and Scala and parallelization - java

So, I have a couple of questions about scala.
1) Would writing a new project in scala be speedier (in terms of performance) or should I just stick to regular java? The project that I am going to inherit is already written in java, but it can be massively parallelized. Also, this project is for academic purposes, so I'm a little worried that it's not such a good idea. I know I'd have to run it by my supervisor first, but it's just a thought.
2) Just to be sure, I can compile my scala code on my computer and execute the "binaries" (compiled byte code) on a cluster that has the JVM installed?
3) If I were to compile the inherited java program, should it work? And should it be faster?
(The cluster uses SGE)

This depends entirely on how much work you want to put into parallelizing your code (and how parallelizable the algorithms are). Scala makes it easier, but it does just run on the JVM, and Java can employ every threading trick known to the compiler. So if you want to put the effort into the Java, it will be at least as fast as Scala. (Then again, if you write your Scala like Java, there are very few instances where Java will be faster, so basically it's a tie.)
Yes, as long as you supply the scala-library.jar file (and any others that are needed e.g. if you use Swing, which would be a strange thing to do on a massively parallelized system).
Compiling java bytecode almost never is faster than running the bytecode on the JVM. The JVM is more clever than most static compilers.
You didn't ask, but Scala doesn't have built-in support for the Sun Grid Engine. You could potentially set up something with remote actors, for example, but you'd have significant work to do there.

1) No, you won't get superior performance. Your speed as a programmer will be superior, once you get over the learning curve, but this doesn't seem to be your question.
It might be easier to parallelize by using Scala actors & futures or upcoming 2.9's parallel collections. Also, if the code is heavy on generics but has frequent use of boxed primitives, then Scala's specialization feature is much easier than doing the equivalent by hand in Java.
On the other hand, there's Akka actors and futures that are arguably superior and can be used fine from Java, and Java's own parallel stuff isn't bad.
2) Yes.
3) Err, what?

Related

C/C++/Java interpreter

Why has no one ever made a C/C++/Java interpreter which requires no compilation to run? It can't be that hard to make something akin to IDLE for Python.
The same goes for other languages such as FORTRAN, Pascal and others. Is there some kind of code based reason for this or is it just that people prefer to compile their code.

C and C++ are languages designed to be compiled and generate fast and efficient code.
Interpreters, by the definition that they "interpret" the code, are generally slow (or at least slower than the same compiled language, assuming a good compiler).
However, there are combinations of clang and llvm-jit that could be (and has been) built into an "interpreted version of C or C++". I think there are some limitations, and I've never actually used it.
There is no great technical reason NOT to do this, it's just that it's about the same amount of work, if not more than, the work to write a compiler, and the end result of writing a compiler is "better".

Just some thoughts.
All interpreters have to have a lexer/parser stage (although there has been some academic work on incremental compilers).
There is no point in having an interpreter for Java when an IDE can constantly maintain the "all compiled" status.
I've known some interpreters in parallel to compilers, and one sad thing happening there was that the actual languages processed by the "cousins" were not really compatible, which made development on the interpreter risky.
A thing like the JVM makes it less and less desirable to "roll your own".
There used to be an interpreter for a Pascal subset IIRC: P-Pascal) from Wirth's own group. A good study object for students, but a toy for serious SW development.

There is only one sane reason to implement an interpreter for a language which is otherwise perfectly compilable: to have a formal definition of its operational semantics, which, in turn, can be used to derive static code analysis tools, code rewrite tools, various verification engines and so on.
Take a look at this C interpreter for example.

Jython - is it faster to use Python data structures or Java ones?

I'm trying to understand whether and under what circs one should use Python classes and/or Java ones.
If making a specialist dictionary/Map kind of class, should one subclass from Python's dict, or from Java's HashMap or TreeMap, etc.?
It is tempting to use the Python ones just because they are simpler and sexier. But one reason that Jython runs relatively slowly (so it appears to me to do) seems to have something to do with the dynamic typing. I'd better say I'm not that clear about all this, and haven't spent nocturnal hours poring over the Python/Jython interpreter code, to my shame.
Anyway it just occurs to me that the Java classes might possibly run faster because the code might have to do less work. OTOH maybe it has to do more. Or maybe there's nothing in it. Anyone know?

The point of using Jython is that you can write Python code and have it run on the JVM. Don't ruin that by making your Python into Java.
If -- if -- it turns out that your data structure is too slow, you can drop-in replace it with a Java version. But that's for the optimisation stage of programming, which comes later.
I guess I should try to answer your question. I would guess that using native Java structures will be faster (because the JVM can infer more about them than the Python interpreter can), but that might be counterbalanced by the extra processing needed to interface with Jython. Only tests will tell!

Generally, the decision shouldn't be one of speed - the Python classes will be implemented in terms of Java classes anyway, even if they don't inherit from them. So, the speed should be roughly comparable, and at most you would save a couple of method calls per operation.
The bigger question is what you plan on doing with your class. If you're using it with Python APIs, you'll want to use the Python types, or something that behaves like them so that you don't have to do the work of implementing the entire Mapping protocol (only the bits your class changes). If you're using Java APIs, you will certainly need to meet the static type checks - which means you'll need to inherit from Java's classes.
If this isn't easy to answer in your situation, start with the Python ones, since you (correctly ;-) find them "simpler and sexier". If your class doesn't pass outside the boundaries of your project, then this should be trivial to change later if the speed really becomes an issue - and at that point, you might also be thinking about questions like "could it help to implement it entirely at the Java level?" which you've hopefully recognised would be premature optimisation to think about now.

Call scala from java - performance?

I have parser written in Scala due to implementation simplicity.
I need to call it from my java application. I know I need to include scala library in the classpath, etc.
But what about performance? Could it be any strong performance decreases comparing the pure java parser calls?

Calling Scala from Java isn't going to incur any overhead: it's all just bytecode when it executes. It's not as if you were having to travel through some kind of bridge between Java and Scala, as you might if you were calling from Java into, I don't know, Python.
Whether the Scala implementation of this particular algorithm is going to run faster or slower will depend on the nature of the algorithm and the way you'd implement it. Given that you're not going to implement it the same way in the two languages, it's very hard to predict.

In theory there should not be any impact, as Scala uses the same JVM and bytecode as Java and all performance test I've seen give roughly the same speed to both languages.
That is, it should be the same (performance wise) to calling a Java library, or so similar that the difference shouldn't matter.

How is the performance of Groovy compared with Java?

What the performance of Groovy compared with Java?

It's obviously true that Groovy is compiled to JVM. This however has little to do with the performance.
The most important thing to note here is that Groovy is a dynamic language. This essentially means that most of the time Groovy compiler will have little to no knowledge about the type of an object it is calling a method on / retrieving a property from. This has a huge impact on the performance. There might be thousands of different classes implementing someFancyMethodName() not having a common base class. Yet a call to obj.someFancyMethodName() has to choose the right one. There isn't any better way of doing this than deciding it at runtime based on some kind of reflection. In fact, because of this every single call to a method gets dispatched through a call to invokeMethod() on the object metaclass. This is very much visible in stacktraces if your program ever throws some nasty exceptions. It's even worse. Any class in groovy may choose to provide implementations of methods of the given name dynamically, that is producing them at runtime. There is a fair amount of Grails magic that makes a heavy use of it. Another complication emerges when method overloading comes into play. As the knowledge of types is so limited, it's impossible to choose the right version of the method at compile time. The produced code has to look into the supplied objects and then by making a series of if-elses choose the implementation that best fits the provided call. This most of the time is a really non-trivial process, that was never intended to be performed at runtime. Yet, Groovy has to do it, in order to stay inter-operable with Java.
All that makes Groovy pretty slow. In fact much slower and, what is more painful, more memory consuming than most of the dynamic languages out there (Python for instance).
That said, I agree that the reason for using Groovy is certainly not performance. Most of the time, you will end up optimizing only a small fraction of your code. If performance is such an issue, you can always resort to rewriting those specific pieces in pure Java or give a try to Groovy++. Haven't tried it myself, however the results that I read about online seemed pretty promising.
Groovy 2.0 I have no experience in running the newer version. Quite frankly, I'm not an active Groovy user anymore. I would however expect that most of the issues described above, are fundamentally hard and require a major scientific breakthrough. I have some experience developing HHVM (a PHP virtual machine created by Facebook) and there are much simpler features that performed poorly.

So here we are in 2012 and Groovy 2.0 is ready to rock...
"With the #CompileStatic, the performance of Groovy is about 1-2 times slower than Java, and without Groovy, it's about 3-5 times slower. (...) This means to me that Groovy is ready for applications where performance has to be somewhat comparable to Java."
Performance Test: Groovy 2.0 vs. Java
http://java.dzone.com/articles/groovy-20-performance-compared
And besides the autor, I've used Groovy since 2008 with great success, not only for CV, just to make job done in time business need. Performance is ever relative to what you want to do.
For those who are complaining about numeric use cases, here goes a real use case using web frameworks: http://www.jtict.com/blog/rails-wicket-grails-play-lift-jsp/
"Groovy 1.8.x prototype for fib(42) takes about 3.8s (only 12% slower than Java, over a hundred times faster than Groovy 1.0) So we may no longer encourage people to write such 'hot spots' in Java."
Source: http://www.wiki.jvmlangsummit.com/images/0/04/Theodorou-Faster-Groovy-1.8.pdf
"I'm impressed on how much Groovy's performance has improved for numerical computing. Groovy 1.8 in my project jlab (http://code.google.com/p/jlabgroovy/) sometimes outperforms Scala's performance in my other project ScalaLab (http://code.google.com/p/scalalab) !!"
Source: http://groovy.329449.n5.nabble.com/Great-improvements-in-Groovy-s-performance-for-numerical-computing-td4334768.html

Groovy offers a lot more syntactic sugar over Java, but still runs on the JVM and therefore requires a bit more work by the JVM to provide that sugar. Nevertheless, the difference is extremely minor in the vast majority of normal usages.
In addition, if you do happen to write a function that runs too slowly in Groovy, you can write it in straight Java and call it from your Groovy code. That's the team's recommended solution, and I can vouch for it working well and simply.
It my opinion, for the programming most of us do, it's a non-issue.

A quick Google search yielded some old performance results (http://www.codecommit.com/blog/java/groovys-performance-is-not-subjective, http://www.christianschenk.org/blog/performance-comparison-between-groovy-and-java/).
Groovy++ looks interesting also (http://stronglytypedblog.blogspot.com/2010/02/java-vs-scala-vs-groovy-vs-groovy.html).
However, the reason to use Groovy should be because it improves your performance not the computers...

I think , you have to look at this scientific comparison of Groovy Vs Python Vs PHP vs Ruby.
http://blog.websitesframeworks.com/2013/11/comparison-of-programming-languages-ruby-groovy-python-and-php-353/
They have made one exercise and produce comparison on these programming languages on the below factors:
Comparison of time developing each exercise
Comparison of readability of the languages
Comparison of results in benchmarks and lines of code. From the project Computer Language Benchmarks Game
Conclusions
It is a great quick study to enable you which language is better.

Generally speaking, Groovy will be slower.
You can avoid that by switching to Groovy++ which offers most of the features of Groovy, but can be statically compiled and has performance comparable to Java.

While developing AWS Lambdas Java will be faster than groovy. Apart from that in all other scenarios you might not feel much difference.

Groovy is compiled to bytecode .class files but running Groovy classes requires ~5MB groovy library that make performance overhead.

How can I transition from Java to Clojure?

After discovering Clojure I have spent the last few days immersed in it.
What project types lend themselves to Java over Clojure, vice versa, and in combination?
What are examples of programs which you would have never attempted before Clojure?

Clojure lends itself well to concurrent programming. It provides such wonderful tools for dealing with threading as Software Transactional Memory and mutable references.
As a demo for the Western Mass Developer's Group, Rich Hickey made an ant colony simulation in which each ant was its own thread and all of the variables were immutable. Even with a very large number of threads things worked great. This is not only because Rich is an amazing programmer, it's also because he didn't have to worry about locking while writing his code. You can check out his presentation on the ant colony here.

If you are going to try concurrent programming, then I think clojure is much better than what you get from Java out of the box. Take a look at this presentation to see why:
http://blip.tv/file/812787
I documented my first 20 days with Clojure on my blog
http://loufranco.com/blog/files/category-20-days-of-clojure.html
I started with the SICP lectures and then built a parallel prime number sieve. I also played around with macros.

What project types lend themselves to using Java over Clojure, vice
versa, or in combination?
A project where a GUI-building tool
(such as Matisse in Netbeans) is
needed would be a case where Java may
still be required. Anything done in
Java can be done in Clojure quite
readily, with proxy and gen-class if
needed, or just accessing Java as
needed (., doto, new, etc.). This
allows Clojure projects to easily use
Java libraries or legacy Java code.
Which programs which you would have never attempted before Clojure ?
Before I found Clojure, I was
contemplating a project that required
JDBC, would run in a servlet
container, and I anticipated doing a
lot of iterative development because
it wasn't clear what methods would
work for the data I needed to analyze.
I put it on the back burner because I
didn't have the time or patience for
the compile-debug- deploy-validation
cycling that Java requires. I've now
written the application in Clojure,
and I'm very pleased at the ease of
making changes on the fly and being
able to examine the results
immediately. Not to mention the joy
of lock-free programming and being
liberated from having to develop (and
refactor) class hierarchies.
- "MikeM" via the clojure#googlegroups.com mailinglist

What project types lend themselves to Java over Clojure, vice versa, and in combination?
If you want to develop a framework that is to be consumed by Java and Clojure, I've found writing the main abstractions (interfaces ad base classes) in Java to be preferable over writing them in Clojure (I find Clojure's gen-class to be somewhat tedious and rather use proxy).
If you're a user of Hibernate or any other framework that makes heavy use of Java-annotations without offering a programmatic alternative, you'll have some trouble, since it's not trivial to emulate annotated POJOs with Clojure's data structures.
Apart from that, I've experienced no use cases for which Clojure is less appropriate than Java; you have to cope with the loss of static typing, of course, which feels somewhat disconcerting at first, but tends to go away.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.