Spark - How to use SparkContext within classes? - java

I am building an application in Spark, and would like to use the SparkContext and/or SQLContext within methods in my classes, mostly to pull/generate data sets from files or SQL queries.
For example, I would like to create a T2P object which contains methods that gather data (and in this case need access to the SparkContext):
class T2P (mid: Int, sc: SparkContext, sqlContext: SQLContext) extends Serializable {
def getImps(): DataFrame = {
val imps = sc.textFile("file.txt").map(line => line.split("\t")).map(d => Data(d(0).toInt, d(1), d(2), d(3))).toDF()
return imps
}
def getX(): DataFrame = {
val x = sqlContext.sql("SELECT a,b,c FROM table")
return x
}
}
//creating the T2P object
class App {
val conf = new SparkConf().setAppName("T2P App").setMaster("local[2]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
val t2p = new T2P(0, sc, sqlContext);
}
Passing the SparkContext as an argument to the T2P class doesn't work since the SparkContext is not serializable (getting a task not serializable error when creating T2P objects). What is the best way to use the SparkContext/SQLContext inside my classes? Or perhaps is this the wrong way to design a data pull type process in Spark?
UPDATE
Realized from the comments on this post that the SparkContext was not the problem, but that I was using a using a method within a 'map' function, causing Spark to try to serialize the entire class. This would cause the error since SparkContext is not serializable.
def startMetricTo(userData: ((Int, String), List[(Int, String)]), startMetric: String) : T2PUser = {
//do something
}
def buildUserRollup() = {
this.userRollup = this.userSorted.map(line=>startMetricTo(line, this.startMetric))
}
This results in a 'task not serializable' exception.

I fixed this problem (with the help of the commenters and other StackOverflow users) by creating a separate MetricCalc object to store my startMetricTo() method. Then I changed the buildUserRollup() method to use this new startMetricTo(). This allows the entire MetricCalc object to be serialized without issue.
//newly created object
object MetricCalc {
def startMetricTo(userData: ((Int, String), List[(Int, String)]), startMetric: String) : T2PUser = {
//do something
}
}
//using function in T2P
def buildUserRollup(startMetric: String) = {
this.userRollup = this.userSorted.map(line=>MetricCalc.startMetricTo(line, startMetric))
}

I tried several options, this is what worked eventually for me..
object SomeName extends App {
val conf = new SparkConf()...
val sc = new SparkContext(conf)
implicit val sqlC = SQLContext.getOrCreate(sc)
getDF1(sqlC)
def getDF1(sqlCo: SQLContext): Unit = {
val query1 = SomeQuery here
val df1 = sqlCo.read.format("jdbc").options(Map("url" -> dbUrl,"dbtable" -> query1)).load.cache()
//iterate through df1 and retrieve the 2nd DataFrame based on some values in the Row of the first DataFrame
df1.foreach(x => {
getDF2(x.getString(0), x.getDecimal(1).toString, x.getDecimal(3).doubleValue) (sqlCo)
})
}
def getDF2(a: String, b: String, c: Double)(implicit sqlCont: SQLContext) : Unit = {
val query2 = Somequery
val sqlcc = SQLContext.getOrCreate(sc)
//val sqlcc = sqlCont //Did not work for me. Also, omitting (implicit sqlCont: SQLContext) altogether did not work
val df2 = sqlcc.read.format("jdbc").options(Map("url" -> dbURL, "dbtable" -> query2)).load().cache()
.
.
.
}
}
Note: In the above code, if I omitted (implicit sqlCont: SQLContext) parameter from getDF2 method signature, it would not work. I tried several other options of passing the sqlContext from one method to the other, it always gave me NullPointerException or Task not serializable Excpetion. Good thins is it eventually worked this way, and I could retrieve parameters from a row of the DataFrame1 and use those values in loading the DataFrame 2.

Related

How to merge two jsonNodes in one

I have two variables of class User as follows:
val user1 = User().apply {....values here}
val user2 = User().apply {....values here}
I want to create a JsonNode with the following structure:
var node:JsonNode? = null
node = {
"user_1": {
...the fields of class User, assigned in variable user1
},
"user_2": {
...the values for user 2
}
}
I have converted the objects to nodes, but I do not know how to merge them using Jackson.
val mapper1= ObjectMapper()
mapper1.valueToTree<JsonNode>(user1)
val mapper2= ObjectMapper()
mapper2.valueToTree<JsonNode>(user2)
Or is there a more efficient way to create one json node structure with the two classes ?
I am using Kotlin and Jackson databank.
I haven't tested it, but I guess you should be able to simply create a Map<String, User> and convert that into a JsonNode:
val user1 = User().apply {....values here}
val user2 = User().apply {....values here}
val both = mapOf("user1" to user1, "user2" to user2)
val mapper = ObjectMapper()
val result = mapper.valueToTree<JsonNode>(both)

Using Gson with interfaces to fetch data from API

I've no experience with Kotlin and I'm trying to write an app fetching data from diffrent financial APIs using Gson. I have created two classes implementing an interface and I'd like to instantiate it in generic function. Right now I have two diffrent methods to operate on each API and I'd like to make it more decent.
EDIT:
I want to make a generic function out of two given functions:
Interface and two classes:
interface TickerEntity{
val tickers: Array<String>
data class MainData (
val Bid: Double,
val Ask: Double
)
}
object API1TickerEntity : TickerEntity {
val tickers = arrayOf<String>("BTC-LTC", "BTC-DOGE", "BTC-POT", "BTC-USD")
data class MainData(
val success: Boolean,
val message: String,
val result: ResultData
)
data class ResultData (
val Bid: Double,
val Ask: Double,
val Last: Double
)
}
object API2TickerEntity : TickerEntity {
val tickers = arrayOf<String>("LTCBTC", "BTCDOGE", "BTCPOT", "BTCUSD")
data class MainData(
val max : Double,
val min : Double,
val last : Double,
val bid : Double,
val ask : Double,
val vwap : Double,
val average : Double,
val volume : Double
)
}
My functions to manage Json I want to be generic:
data class BuySell( val stockName: String, val buy: Double = 0.0, val sell: Double = 0.0)
fun getAPI1BuySell(): () -> BuySell {
val currency = API1TickerEntity.tickers[0]
val response = sendRequest("somesite.com")
val gson = Gson()
val ticker: API1Entity.MainData = gson.fromJson(response.body, API1TickerEntity.MainData::class.java)
println(currency)
return { BuySell("API1", ticker.result.Ask, ticker.result.Bid) }
}
fun getAPI2BuySell(): () -> BuySell {
val currency = API2TickerEntity.tickers[0]
val response = sendRequest("someothersite.com")
val gson = Gson()
val ticker: API2TickerEntity.MainData = gson.fromJson(response.body, API2TickerEntity.MainData::class.java)
return { BuySell("API2", ticker.ask, ticker.bid) }
}
So far I have tried:
fun <T : TickerEntity> getStockBuySell(url: String, stockName: String): () -> BuySell {
val tickerEntity : T = T
val currency = tickerEntity.tickers[0]
val response = sendRequest(url.replace("{}", currency))
val gson = Gson()
val ticker: tickerEntity.MainData = gson.fromJson(response.body, tickerEntity.MainData::class.java)
println(currency)
return { BuySell (stockName, ticker.Ask, ticker.Bid) }
}
}
But I can't instantiate the interface alone, and also it seems I can't override data class alone since it is not a value.
JSON files to manage:
API1:
{"success":true,"message":"","result":{"Bid":0.00596655,"Ask":0.00597554,"Last":0.00597933}}
API2:
{"max":0.00606939,"min":0.0059345,"last":0.00595134,"bid":0.00594972,"ask":0.00599205,"vwap":0.00595134,"average":0.00595134,"volume":29.60407718}
All help appreciated

How to satisfy nested generic requirement of Class<T> in kotlin

I am trying to call a method where it's signature includes a parameter of Class<T>
below is the sample code in kotlin
val response: ResponseEntity<ResponseObject<*>> = testRestTemplate.postForEntity("/users", user, ResponseObject::class.java)
what i am trying to achieve is to get rid of the <*> in responseObject and let it be
val response: ResponseEntity<ResponseObject<User>> = ???
but i am not sure on what is the correct syntax to provide to satisfy the Class<T> requirement
i tried
ResponseObject<User::class.java>::class.java
but that is not a valid syntax. any pointers?
The real problem is if i use * i don't know how exactly to infer the User instance from there correctly.
ok I managed to solve my problem using type casting using when
#Test
fun testCreateUser() {
val user = User(id = null)
val response = testRestTemplate.postForEntity("/users", user, ResponseObject::class.java)
val responseObject = response.body
when (val returnedUser = responseObject.model) {
is User -> {
assertNotNull(returnedUser.id)
assertEquals(UserStatus.active, returnedUser.status)
}
}
}
If you can change the signature of the method then you may try something similar to the following:
class ResponseEntity<T : Any>(val body: T)
class ResponseObject<T : Any>(val model: T)
data class User(val id: Long, val status: String)
fun <M : Any, K : ResponseObject<M>> postForEntity(paht: String, model: M): ResponseEntity<K> {
return TODO()
}
val response: ResponseEntity<ResponseObject<User>> = postForEntity("/users", User(1, "good"))
You could use
#Suppress("UNCHECKED_CAST")
val response = testRestTemplate.postForEntity("/users", user, ResponseObject::class.java as Class<ResponseObject<User>>)
Or a helper function if you need it more than once for different parameters
inline fun <reified T> classOf<T>() = T::class.java
val response = testRestTemplate.postForEntity("/users", user, classOf<ResponseObject<User>>())
(in both cases the type ResponseEntity<ResponseObject<User>> should be inferred)

Sending Streaming Data as JSON in Java/Scala

I'm used to python and using the Scala Spark Streaming libraries to handle real-time Twitter streaming data. Right now, I'm able to send as a string, however, my streaming service requires JSON. Is there a way I can easily adapt my code to send as JSON dictionary instead of a String?
%scala
import scala.collection.JavaConverters._
import com.microsoft.azure.eventhubs._
import java.util.concurrent._
val namespaceName = "hubnamespace"
val eventHubName = "hubname"
val sasKeyName = "RootManageSharedAccessKey"
val sasKey = "key"
val connStr = new ConnectionStringBuilder()
.setNamespaceName(namespaceName)
.setEventHubName(eventHubName)
.setSasKeyName(sasKeyName)
.setSasKey(sasKey)
val pool = Executors.newFixedThreadPool(1)
val eventHubClient = EventHubClient.create(connStr.toString(), pool)
def sendEvent(message: String) = {
val messageData = EventData.create(message.getBytes("UTF-8"))
// CONVERT IT HERE?
eventHubClient.get().send(messageData)
System.out.println("Sent event: " + message + "\n")
}
import twitter4j._
import twitter4j.TwitterFactory
import twitter4j.Twitter
import twitter4j.conf.ConfigurationBuilder
val twitterConsumerKey = "key"
val twitterConsumerSecret = "key"
val twitterOauthAccessToken = "key"
val twitterOauthTokenSecret = "key"
val cb = new ConfigurationBuilder()
cb.setDebugEnabled(true)
.setOAuthConsumerKey(twitterConsumerKey)
.setOAuthConsumerSecret(twitterConsumerSecret)
.setOAuthAccessToken(twitterOauthAccessToken)
.setOAuthAccessTokenSecret(twitterOauthTokenSecret)
val twitterFactory = new TwitterFactory(cb.build())
val twitter = twitterFactory.getInstance()
val query = new Query(" #happynewyear ")
query.setCount(100)
query.lang("en")
var finished = false
while (!finished) {
val result = twitter.search(query)
val statuses = result.getTweets()
var lowestStatusId = Long.MaxValue
for (status <- statuses.asScala) {
if(!status.isRetweet()){
sendEvent(status.getText())
}
lowestStatusId = Math.min(status.getId(), lowestStatusId)
Thread.sleep(2000)
}
query.setMaxId(lowestStatusId - 1)
}
eventHubClient.get().close()
Scala has no native way to convert string to Json, you'll need to use an external library. I recommend using Jackson. If you use gradle you can add a dependency like this: compile("com.fasterxml.jackson.module:jackson-module-scala_2.12"). (Use appropriate scala version)
Then, you can simply convert your data object to JSON like this:
val mapper = new ObjectMapper() with ScalaObjectMapper
mapper.registerModule(DefaultScalaModule)
val json = valueToTree(messageData)
I'd strongly recommend you put your effort in Jackson, you'll need it a lot if you work with JSON.

Passing type in Scala as an argument

I want to pass a type to a function in Scala.
Problem in detail
First iteration
I have the following Java classes (coming from an external source):
public class MyComplexType {
public String name;
public int number;
}
and
public class MyGeneric<T> {
public String myName;
public T myValue;
}
In this example I want MyComplexType to be the the actual type of MyGeneric; in the real problem there are several possibilities.
I want to deserialize a JSON string using a Scala code as follows:
import org.codehaus.jackson.map.ObjectMapper
object GenericExample {
def main(args: Array[String]) {
val jsonString = "{\"myName\":\"myNumber\",\"myValue\":{\"name\":\"fifteen\",\"number\":\"15\"}}"
val objectMapper = new ObjectMapper()
val myGeneric: MyGeneric[MyComplexType] = objectMapper.readValue(jsonString, classOf[MyGeneric[MyComplexType]])
val myComplexType: MyComplexType = myGeneric.myValue
}
}
it compiles fine but runtime error occurs:
java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to MyComplexType
at GenericExample$.main(GenericExample.scala:9)
Second iteration
Working solution to the problem:
val jsonString = "{\"myName\":\"myNumber\",\"myValue\":{\"name\":\"fifteen\",\"number\":\"15\"}}"
val objectMapper = new ObjectMapper()
val myGeneric: MyGeneric[MyComplexType] = objectMapper.readValue(jsonString, classOf[MyGeneric[MyComplexType]])
myGeneric.myValue = objectMapper.readValue(objectMapper.readTree(jsonString).get("myValue").toString, classOf[MyComplexType])
val myComplexType: MyComplexType = myGeneric.myValue
Not nice but works. (If anybody knows how to make it better, that would also welcome.)
Third iteration
The lines in the solution of second iteration occur in the real problem several times, therefore I want to create a function. The altering variables are the JSON formatted string and the MyComplexType.
I want something like this:
def main(args: Array[String]) {
val jsonString = "{\"myName\":\"myNumber\",\"myValue\":{\"name\":\"fifteen\",\"number\":\"15\"}}"
val myGeneric = extractMyGeneric[MyComplexType](jsonString)
val myComplexType: MyComplexType = myGeneric.myValue
}
private def extractMyGeneric[T](jsonString: String) = {
val objectMapper = new ObjectMapper()
val myGeneric = objectMapper.readValue(jsonString, classOf[MyGeneric[T]])
myGeneric.myValue = objectMapper.readValue(objectMapper.readTree(jsonString).get("myValue").toString, classOf[T])
myGeneric
}
This does not work (compiler error). I've already played around with various combinations of Class, ClassTag, classOf but none of them helped. There were compiler and runtime errors as well. Do you know how to pass and how to use such a type in Scala? Thank you!
When you use jackson to parse json, you can use TypeReference to parse generic type. Example:
val jsonString = "{\"myName\":\"myNumber\",\"myValue\":{\"name\":\"fifteen\",\"number\":\"15\"}}"
val objectMapper = new ObjectMapper()
val reference = new TypeReference[MyGeneric[MyComplexType]]() {}
val value: MyGeneric[MyComplexType] = objectMapper.readValue(jsonString, reference)
if you still want to use Jackson, I think you can create a parameter with TypeReference type. like:
implicit val typeReference = new TypeReference[MyGeneric[MyComplexType]] {}
val value = foo(jsonString)
println(value.myValue.name)
def foo[T](jsonStr: String)(implicit typeReference: TypeReference[MyGeneric[T]]): MyGeneric[T] = {
val objectMapper = new ObjectMapper()
objectMapper.readValue(jsonStr, typeReference)
}
Using your approach, I think this is how you can get classes that you need using ClassTags:
def extractMyGeneric[A : ClassTag](jsonString: String)(implicit generic: ClassTag[MyGeneric[A]]): MyGeneric[A] = {
val classOfA = implicitly[ClassTag[A]].runtimeClass.asInstanceOf[Class[A]]
val classOfMyGenericOfA = generic.runtimeClass.asInstanceOf[Class[MyGeneric[A]]]
val objectMapper = new ObjectMapper()
val myGeneric = objectMapper.readValue(jsonString, classOfMyGenericOfA)
myGeneric.myValue = objectMapper.readValue(objectMapper.readTree(jsonString).get("myValue").toString, classOfA)
myGeneric
}
I am not familiar with jackson but in play-json you could easily define Reads for your generic class like this
import play.api.libs.functional.syntax._
import play.api.libs.json._
implicit def genReads[A: Reads]: Reads[MyGeneric[A]] = (
(__ \ "myName").read[String] and
(__ \ "myValue").read[A]
)((name, value) => {
val e = new MyGeneric[A]
e.myName = name
e.myValue = value
e
})
Having this, and provided that instance of Reads for MyComplexType exists, you can implement your method as
def extractMyGeneric[A: Reads](jsonString: String): MyGeneric[A] = {
Json.parse(jsonString).as[MyGeneric[A]]
}
the issue here is that you need to provide Reads for all of your complex types, which would be as easy as
implicit complexReads: Reads[MyComplexType] = Json.reads[MyComplexType]
if those were case classes, otherways I think you would have to define them manually in simillar way to what I've done with genReads.

Categories

Resources