I am new to apache spark and trying to run the wordcount example . But intellij editor gives me the error at line 47 Cannot resolve method 'flatMap()' error.
Edit :
This is the line where I am getting the error
JavaRDD<String> words = lines.flatMap(s -> Arrays.asList(SPACE.split(s)).iterator());
It looks like you're using an older version of Spark that expects Iterable rather than Iterator from the flatMap() function. Try this:
JavaRDD<String> words = lines.flatMap(s -> Arrays.asList(SPACE.split(s)));
See also Spark 2.0.0 Arrays.asList not working - incompatible types
Stream#flatMap is used for combining multiple streams into one, so the supplier method you provided must return a Stream result.
you can try like this:
lines.stream().flatMap(line -> Stream.of(SPACE.split(line)))
.map(word -> // map to JavaRDD)
flatMap method take a FlatMapFunctionas parameter which is not annotated with #FunctionalInterface. So indeed you can not use it as a lambda.
Just build a real FlatMapFunctionobject as parameter and you will be sure of it.
flatMap() is Java 8 Stream API. I think you should check the IDEA compile java version.
compile java version
Related
I am using the below code to walk a directory and fetch the first file . I am not able to fix the two sonar lint issues . Please help.
List<String> result = walk.filter(Files::isRegularFile).map(x -> x.toString()).collect(Collectors.toList());
Please make this change:
List<String> result = walk.filter(p -> p.toFile().isFile()).map(Path::toString).collect(Collectors.toList());
SonarLint also state the reason for the suggestion.
Lambdas should be replaced with method references
Java 8's "Files.exists" should not be used
Look at the 3725 Sonar rule :
The Files.exists method has noticeably poor performance in JDK 8, and
can slow an application significantly when used to check files that
don't actually exist.
The same goes for Files.notExists, Files.isDirectory and
Files.isRegularFile.
Note that this rule is automatically disabled when the project's
sonar.java.source is not 8.
Your project very probably relies on JDK/JRE 8.
If you dig into the OpenJDK issues you could see that on Linux the issue was partially solved but not on Windows.
About the second issue :
map(x -> x.toString())
Just replace it by a method reference :
map(Path::toString)
So finally to be compliant with Sonar, it gives :
//FIXME use Files::isRegularFile when update with Java>8
List<String> result = walk.filter(p -> p.toFile().exists())
.map(Path::toString)
.collect(Collectors.toList());
I have a written a method in Scala that is using a method written in Java - processSale() method takes util.List<Sale> as a parameter.
But after groupByKey() I'm getting an RDD[(String, Iterable[Sale])]. I've tried to import scala.collection.JavaConverters._ and do SaleParser.processSale(a.asJava).
However it gives me an Iterable[Sale]. How is it possible to convert it into a Java util.List?
val parseSales: RDD[(String, Sale)] = rawSales
.map(sale => sale.Id -> sale)
.groupByKey()
.mapValues(a => SaleParser.processSale(???))
a.toSeq.asJava
Note that if this Iterable is actually a Seq, toSeq just returns the same object.
See API doc for the complete list of conversions.
Am getting started with Spark, and ran into issue trying to implement the simple example for map function. The issue is with the definition of 'parallelize' in the new version of Spark. Can someone share example of how to use it, since the following way is giving error for insufficient arguments.
Spark Version : 2.3.2
Java : 1.8
SparkSession session = SparkSession.builder().appName("Compute Square of Numbers").config("spark.master","local").getOrCreate();
SparkContext context = session.sparkContext();
List<Integer> seqNumList = IntStream.rangeClosed(10, 20).boxed().collect(Collectors.toList());
JavaRDD<Integer> numRDD = context.parallelize(seqNumList, 2);
Compiletime Error Message : The method expects 3 arguments
I do not get what the 3rd argument should be like? As per the documentation, it's supposed to be
scala.reflect.ClassTag<T>
But how to even define or use it?
Please do not suggest using JavaSparkContext, as i wanted to know how to get this approach to work with using generic SparkContext.
Ref : https://spark.apache.org/docs/2.2.1/api/java/org/apache/spark/SparkContext.html#parallelize-scala.collection.Seq-int-scala.reflect.ClassTag-
Here is the code which worked for me finally. Not the best way to achieve the result, but was a way to explore the API for me
SparkSession session = SparkSession.builder().appName("Compute Square of Numbers")
.config("spark.master", "local").getOrCreate();
SparkContext context = session.sparkContext();
List<Integer> seqNumList = IntStream.rangeClosed(10, 20).boxed().collect(Collectors.toList());
RDD<Integer> numRDD = context
.parallelize(JavaConverters.asScalaIteratorConverter(seqNumList.iterator()).asScala()
.toSeq(), 2, scala.reflect.ClassTag$.MODULE$.apply(Integer.class));
numRDD.toJavaRDD().foreach(x -> System.out.println(x));
session.stop();
If you don't want to deal with providing the extra two parameters using sparkConext, you can also use JavaSparkContext.parallelize(), which only needs an input list:
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.rdd.RDD;
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
final RDD<Integer> rdd = jsc.parallelize(seqNumList).map(num -> {
// your implementation
}).rdd();
I'm trying to use lambdas and streams in java but I'm quite new to it.
I got this error in IntelliJ "target type of lambda conversion must be an interface" when I try to make a lambda expression
List<Callable<SomeClass>> callList = prgll.stream()
.map(p->(()->{return p.funct();} )) <--- Here I get error
.collect(Collectors.toList());
Am I doing something wrong?
I suspect it's just Java's type inference not being quite smart enough. Try
.map(p -> (Callable<SomeClass>) () -> p.funct())
Stream#map() is a typed method, so you can explicitly specify the type:
.<Callable<SomeClass>>map(p -> () -> p.funct())
or neater, use a method reference:
.<Callable<SomeClass>>map(p -> p::funct)
I'm new to Scala and our project mixes Java and Scala code together (using the Play Framework). I'm trying to write a Scala method that can take a nested Java Map such as:
LinkedHashMap<String, LinkedHashMap<String, String>> groupingA = new LinkedHashMap<String, LinkedHashMap<String,String>>();
And have that java object passed to a Scala function that can loop through it. I have the following scala object definition to try and support the above Java nested map:
Seq[(String, Seq[(String,String)])]
Both the Java file and the Scala file compile fine individually, but when my java object tries to create a new instance of my scala class and pass in the nested map, I get a compiler error with the following details:
[error] ..... overloaded method value apply with alternatives:
[error] (options: java.util.List[String])scala.collection.mutable.Buffer[(String, String)] <and>
[error] (options: scala.collection.immutable.List[String])List[(String, String)] <and>
[error] (options: java.util.Map[String,String])Seq[(String, String)] <and>
[error] (options: scala.collection.immutable.Map[String,String])Seq[(String, String)] <and>
[error] (options: (String, String)*)Seq[(String, String)]
[error] cannot be applied to (java.util.LinkedHashMap[java.lang.String,java.util.LinkedHashMap[java.lang.String,java.lang.String]])
Any ideas here on how I can pass in a nested Java LinkedHashMap such as above into a Scala file where I can generically iterate over a nested collection? I'm trying to write this generic enough that it would also work for a nested Scala collection in case we ever switch to writing our play framework controllers in Scala instead of Java.
Seq is a base trait defined in the Scala Collections hierarchy. While java and scala offer byte code compatibility, scala defines a number of its own types including its own collection library. The rub here is if you want to write idiomatic scala you need to convert your java data to scala data. The way I see it you have a few options.
You can use Richard's solution and convert the java types to scala types in your scala code. I think this is ugly because it assumes your input will always be coming from java land.
You can write beautiful, perfect scala handler and provide a companion object that offers the ugly java conversion behavior. This disentangles your scala implementation from the java details.
Or you could write an implicit def like the one below genericizing it to your heart's content.
.
import java.util.LinkedHashMap
import scala.collection.JavaConversions.mapAsScalaMap
object App{
implicit def wrapLhm[K,V,G](i:LinkedHashMap[K,LinkedHashMap[G,V]]):LHMWrapper[K,V,G] = new LHMWrapper[K,V,G](i)
def main(args: Array[String]){
println("Hello World!")
val lhm = new LinkedHashMap[String, LinkedHashMap[String,String]]()
val inner = new LinkedHashMap[String,String]()
inner.put("one", "one")
lhm.put("outer",inner);
val s = lhm.getSeq()
println(s.toString())
}
class LHMWrapper[K,V,G](value: LinkedHashMap[K,LinkedHashMap[G,V]]){
def getSeq():Seq[ (K, Seq[(G,V)])] = mapAsScalaMap(value).mapValues(mapAsScalaMap(_).toSeq).toSeq
}
}
Try this:
import scala.collections.JavaConversions.mapAsScalaMap
val lhm: LinkedHashMap[String, LinkedHashMap[String, String]] = getLHM()
val scalaMap = mapAsScalaMap(lhm).mapValues(mapAsScalaMap(_).toSeq).toSeq
I tested this, and got a result of type Seq[String, Seq[(String, String)]]
(The conversions will wrap the original Java object, rather than actually creating a Scala object with a copy of the values. So the conversions to Seq aren't necessary, you could leave it as a Map, the iteration order will be the same).
Let me guess, are you processing query parameters?