I am planning on writing a library that will primarily be consumed from Java. However, I would like to write this in Scala. I have read most of the documentation on Java/Scala interop, but it is mostly focused on using existing libraries, rather than best practices to ensure seamless interop when writing libraries.
What are some ways of doing this effectively? Ideally, the consumers would not know the library was written in Scala at all.
Are there any other major libraries that do this?
My current plan is to have an API that exposes everything needed with Scala types/features, then having a smaller layer on top of this that converts to Java types. Are there any issues with this approach?
Is there any reason you would want to do this? It's one thing to write a Scala library that can also be used from Scala, but if you want to target Java specifically it probably just makes sense to use Java to write the library.
Ideally, the consumers would not know the library was written in Scala at all.
The biggest issues with this is going to be transient dependencies. Even if you use no third party libraries from yours, what about the Scala standard library? If your library would be used in a project that wasn't otherwise using Scala, they would need to pull in the entire Scala library as well unless you don't use the standard lib at all in your code.
You are going to want to use Java collections instead of Scala collections, and since that's what your client code will be expecting. You should have no Scala Standard Library classes in your public interface. This also means use Java Functional Interfaces instead of Scala Function Types
Avoid companion objects except for situations where they compile down to static members on a class. Accessing a Scala companion object from Java is cumbersome.
Use Java Bean conventions for getters/setters/case classes instead of the regular Scala properties (I believe Scala has an annotation for this).
As far as existing libraries, the Spark Java API is written in Scala, but Spark is primary build for Scala with some Java support, but maybe worth looking at (especially to see the differences between the Scala and Java apis.
The only concerns I found are needing to included the Scala runtime library, and to append _2.13 to the artifact names.
By publishing via Maven (sbt publishM2), even the Scala library comes automatically to the downstream project.
Related
Currently, my main application is built with Java Spring-boot and this won't change because it's convenient.
#Autowired service beans implements, for example :
Enterprise and establishment datasets. The first one is also able to return a list of Enterprise objects that have a Map of their establishments.
So the service returns : Dataset<Enterprise>, Dataset<Establishment>, Dataset<Row>
Associations : Dataset<Row>
Cities : Dataset<Commune> or Dataset<Row>,
Local authorities : Datatset<Row>.
Many user case functions are calls of this kind :
What are associations(year=2020) ?
And my applications forward to datasetAssociation(2020) that operates with enterprises and establishments datasets and with cities and local authorities ones to provide an useful result.
Many recommended me to benefit from Scala abilities
For this, I'm considering an operation involving other ones between datasets :
Some made of Row,
Some carrying concrete objects.
I have this operation to do, in term of datasets reached/involved :
associations.enterprises.establishments.cities.localautorities
Will I be able to write the bold part in Scala ? This means that :
A Dataset<Row> built with Java code is sent to a Scala function to be completed.
Scala creates a new dataset with Enterprise and Establishment objects.
a) If the source of an object is written in Scala I don't have to recreate a new source for it in Java.
b) conversely if the source of an object is written in Java, I don't have to recreate a new source in Scala.
c) I can use a Scala object returned by this dataset on Java side directly.
Scala will have to call functions kept implemented in Java and send them the underlying dataset it is creating (for example to complete them with cities information).
Java calls Scala methods at anytime
and Scala calls Java methods at anytime too :
an operation could follow a
Java -> Scala -> Scala -> Java -> Scala -> Java -> Java
path if wished, in term of native language of method called.
Because I don't know in advance what parts I will find useful to port in Scala or not.
Completing these three points, I will consider that Java and Scala are able interoperable the two way and benefit one from the other.
But may I achieve this goal (in Spark 2.4.x or more probably in Spark 3.0.0) ?
Summarizing, are Java and Scala interoperable the two ways, a manner that :
It does not make the source code too clumsy one side or the other. Or worst : duplicated.
It don't degrade performances strongly (having to recreate a whole dataset or convert each of the object it contains, one side or the other, for example, would be prohibitive).
As Jasper-M wrote, scala and java code are perfectly inter-operable:
they both compile into .class files that are executed the same way by the jvm
The spark java and scala API works together, with couple of specifics:
Both use the same Dataset class, so there are no issue there
However SparkContext and RDD (and all RDD variants) have scala api that aren't practical in java. Mainly because scala methods takes scala type as input that are not those you use in java. But there are java wrapper for both of them (JavaSparkContext, JavaRDD). Coding in java, you probably have seen those wrapper already.
Now, as many have recommended, spark being a scala library first, and the scala language being more powerful than java (*), using scala to write spark code will be much easier. Also, you will find much more code-example in scala. It is often difficult to find java code example for complex Dataset manipulation.
So, I think the two main issues you should be taking care of are:
(not spark related, but necessary) have a project that compiles both language and allows two-way inter-operability. I think sbt provides it out-of-the-box, and with maven you need to use the scala plugin and (from my experience) put both java and scala files in the java folder. Otherwise one can call the other, but not the opposite (scala call java but java cannot call scala, or the other way around)
You should be careful of the encoder that are used each time you create a typed Dataset (i.e. Dataset[YourClass] and not Dataset<Row>). In Java, and for java model classes, you need to use Encoders.bean(YourClass.class) explicitely. But in scala, by default spark find the encoder implicitly, and the encoders are build for scala case classes ("Product types") and scala standard collections. So just be mindful of which encoders are used. For example, if you create a Dataset of YourJavaClass in scala, I think you will probably have to give explicitly the Encoders.bean(YourJavaClass.class) for it to work and not have serialization issues.
One last note: you wrote that you use java Spring-boot. So
Be aware that Spring design goes completely against scala/functional recommended practice. Using null and mutable stuff all over. You can still use Spring, but it might be strange in scala, and the community will probably not accept it easily.
You can call spark code from a spring context, but should not use spring (context) from spark, especially inside methods distributed by spark, such as in rdd.map. This will attempt to create Spring context in each worker which is very slow and can easily fail.
(*) About "scala being more powerful than java": I don't mean that scala is better than java (well I do think so, but it is a matter of taste :). What I mean is that the scala language provides much more expressiveness than java. Basically it does more with less code. The main differences are:
implicits, which are heavily used by spark api
monad + for-comprehension
and of course the powerful type-system (read about co-variant types for example, a List[Dog] is a subclass of List[Animal] in scala, but not in java)
Yes, it is possible without performance degradations or overly clumsy extra code.
Scala and Java are almost perfectly interoperable and moreover the Spark Dataset API is shared between Java and Scala. The Dataset class is exactly the same whether you are using Java or Scala. As you can see in the javadoc or scaladoc (note they only differ in layout, not in content) the Java and Scala code is perfectly interchangeable. At most the Scala code will be a bit more succinct.
I've inherited a project, originally written in C++.
Due to implementation requirements, I need to either re-write the project in a JVM based language, like Java or Kotlin, or simply wrap the existing code in a Java or Kotlin (RESTful) API.
The existing code base is also entangled with an very old network simulation framework.
I'm therefore leaning heavily towards untangling the simulation framework and wrapping the C++ code using something like JNI and SWIG to implement in a non simulated environment.
I'm wondering if JNI and SWIG are still the best options available?
Any advice will be greatly appreciated!
Wrapping with JNI (or SWIG) requires a clever definition of the API. See a nice explanation here. There exist some C++ frameworks that make JNI operations easier on the C++ side, consider them early, before you get too deeply invested in reference management. From what you write, it may be necessary also to provide some Java interface below the C++ layer, e.g. we use the the Java network APIs on Android, even from C++ code.
I was writing some code and was going to throw a RuntimeException as a default for something, when I noticed that there are two options for RuntimeException - one from java.lang and one from kotlin:
Inside the kotlin version of this, there's an entire list of other methods which work the same way :
So, from my understanding of this, the one from package kotlin is simply an alias for the java equivalent (correct me if I'm wrong) which leads me to the question :
what is the point of having this alias file and when should you use it over the "standard" java equivalent ? Does this simply save a few imports ?
When using the JDK, these map to JDK classes. When using Kotlin for Javascript, they would map to a specific implementation in the Kotlin Javascript library. Documentation about actual and expect here.
To answer your question, if there's a chance of you porting your code to also work on another platform, always use the kotlin. variant. Otherwise, it doesn't matter.
In my opinion typealias is strong feature provided by Kotlin language.
Why? Because it gives ability to extend code to multiple domains like extension to provide interoperability between any other languages and Kotlin.
It also becomes helpful when providing APIs or SDK with semantic versioning without worrying much about changes affecting on lower versions of APIs.
One good example would be Collections derived from Java to Kotlin language with some additional & powerful methods as typealias (Literally it saves lot of development time and efforts).
Another good example would be multiplatform programming support by Kotlin language that helps you create APIs with actual and expect keywords implementation.
So long story short, I prefer using RuntimeException from Kotlin package to extend support amongst variety of Kotlin versions (You can see newly added classes starting from version 1.3, but doesn't affect existing API).
As an experiment I am working on a small platform for Java applications (client server, P2P etc.) which I would like to be able to use from both Java and Scala. And possibly also from Groovy, jRuby etc.
I know that Scala can call easily into Java, but that Java cannot as easily call back into Scala. I don't know how this is working in jRuby and Groovy.
My initial thought was to implement the platform in Java, and then see if I could make it available to other languages. However, the platform may include starting threads which call components that implement certain Java interfaces. Would it be possible to pass e.g. a factory implemented in Scala, but which implements a Java interface, to a Java component and have that Java component call the Scala factory?
What are your recommendations in general for implementing such a cross language platform?
I am by no means an expert and have never done such a cross-language project (just minor Java/Scala).
But I would suggest implementing it in the language:
you know best
makes you most productive
ensures the most maintainability
In my case it would be Scala.
So what I would do is define the API in the form of Java interfaces, which I would then implement in Scala (probably I would need to convert Scala/Java collections back-and-forth, but that would be an implementation detail).
That way you make sure that other JVM languages can interact with your API, and you implement it in whatever language you are best.
Java can call Scala as easy at the other way around,
With eclipse you need to install 'Scala IDE' from the 'eclipse market place'
and configure the compiler to 'Scala Then Java' option in Properties->'Scala Compiler'-> 'Build Manager'-> compileorder.
P.s. you also need to add Scala nature to the project.
Would it be possible to pass e.g. a factory implemented in Scala, but which implements a Java interface, to a Java component and have that Java component call the Scala factory?
Even if it were so, switching to Scala would not solve the problem, as then jRuby would have even more problems to access that component.
Actually, all listed languages declare interoperability with Java, but not with other languages. So first design goal is, your platform should be Java-compatible in all its API, though some components could be written in other languages - but so that it is cannot be seen by API users. The easiest way to make it so is to write all in Java. Besides, I recommend Java because features of "more advanced" languages has their reverse side - one innocent-looking line of code can consume enormous amounts of CPU time or memory.
I have started porting my server side C# app to Java. One of the things it does is dynamic generation of code using the CodeDOM framework, compiling it to bytecode, and loading the class during runtime. There are also scenarios where we have used Expression Tree for lightweight dynamic generation of methods.
I did some research and I believe the closest I came is Javaassist. I would appreciate if there are any other frameworks that supports dynamic code generation and compilation. I am a fresh recruit in the Java world.
There's the Java Compilation API (for example, http://www.accordess.com/wpblog/an-overview-of-java-compilation-api-jsr-199/).
If you just need to evaluate expressions, consider any EL, like OGNL or MVEL.
Groovy, JRuby, and Scala are the most mainstream dynamic languages that allow a huge range of options, from simple expression evaluation to full-blown Java interaction.
To modify classes, you can use javaassit, asm, cglib(actually, it depends on asm), bcel, and etc.
Among them, I recommend asm, because of its better lightweight and better performance.
Using asm, you can generate some classes dynamically. But if you want to modify the classes, which are running in the jvm, you have to use Java Instrument API to retransform the existed classes.
You can see this paper for more info. http://www.cs.helsinki.fi/u/pohjalai/k05/okk/seminar/Aarniala-instrumenting.pdf
More doc about Instrument from oracle.
http://docs.oracle.com/javase/6/docs/technotes/guides/instrumentation/index.html
I have successfully used Janino for dynamic code generation. Like Javassist, it can compile Java source code in-memory. Also like Javassist, it lacks Java 1.5 language features, but apart from that it is easy to use.
The SimpleCompiler class is a good starting point.
I would look at lisp (or other dynamic languages) that have been targeted to jvm.
wiki article