Running a Java-based Spark Job on spark-jobserver

Running a Java-based Spark Job on spark-jobserver - java

I need to run an aggregation Spark job using spark-jobserver using low-latency contexts. I have this Scala runner to run a job on using a Java method from a Java class.
object AggregationRunner extends SparkJob {
def main(args: Array[String]) {
val ctx = new SparkContext("local[4]", "spark-jobs")
val config = ConfigFactory.parseString("")
val results = runJob(ctx, config)
}
override def validate(sc: SparkContext, config: Config): SparkJobValidation = {
SparkJobValid;
}
override def runJob(sc: SparkContext, config: Config): Any = {
val context = new JavaSparkContext(sc)
val aggJob = new ServerAggregationJob()
val id = config.getString("input.string").split(" ")(0)
val field = config.getString("input.string").split(" ")(1)
return aggJob.aggregate(context, id, field)
}
}
However, I get the following error. I tried taking out the content returned in the Java method and am now just returning a test string, but it still doesn't work:
{
"status": "ERROR",
"result": {
"message": "Ask timed out on [Actor[akka://JobServer/user/context-supervisor/single-context#1243999360]] after [10000 ms]",
"errorClass": "akka.pattern.AskTimeoutException",
"stack": ["akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:333)", "akka.actor.Scheduler$$anon$7.run(Scheduler.scala:117)", "scala.concurrent.Future$InternalCallbackExecutor$.scala$concurrent$Future$InternalCallbackExecutor$$unbatchedExecute(Future.scala:694)", "scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:691)", "akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(Scheduler.scala:467)", "akka.actor.LightArrayRevolverScheduler$$anon$8.executeBucket$1(Scheduler.scala:419)", "akka.actor.LightArrayRevolverScheduler$$anon$8.nextTick(Scheduler.scala:423)", "akka.actor.LightArrayRevolverScheduler$$anon$8.run(Scheduler.scala:375)", "java.lang.Thread.run(Thread.java:745)"]
}
}
I am not too sure why there is a timeout since I am only returning a string.
EDIT
So I figured out that the issue was occurring because I was using a Spark context that was created before updating a JAR. However, now that I try to use JavaSparkContext inside the Spark job, it returns to the error shown above.
What would be a permanent way to get rid of the error.
Also, would the fact that I am running a heavy Spark job on a local docker container be a plausible reason for the timeout.

For resolving ask time out issue, please add/change below properties in jobserver configuration file.
spray.can.server {
idle-timeout = 210 s
request-timeout = 200 s
}
for more information take a look at this https://github.com/spark-jobserver/spark-jobserver/blob/d1843cbca8e0d07f238cc664709e73bbeea05f2c/doc/troubleshooting.md

Related

Why are JUnit5 Parametrized tests using argument providers not defined in #ArgumentSource?

I have a problem with my parameterized tests.
#ParameterizedTest
#ArgumentsSource(CorrectMessagesArgumentProvider.class)
void shouldSendMessageForCorrectConfiguration(SmtpConfiguration configuration) {
var expectedMessageBody = fetchMessage();
var alertSender = new AlertSender(configuration);
var alertSubject = subjectFrom();
alertSender.send(expectedMessageBody, alertSubject);
var receivedMessages = greenMail.getReceivedMessages();
assertEquals(1, receivedMessages.length);
}
#ParameterizedTest
#ArgumentsSource(IncorrectMessagesArgumentProvider.class)
void shouldNotSendMessageForIncorrectConfiguration(SmtpConfiguration smtpConfiguration) {
var expectedMessageBody = fetchMessage();
var alertSender = new AlertSender(smtpConfiguration);
var alertSubject = subjectFrom();
var expectedErrorMessage = "Error sending alert";
var actualErrorMessage =
assertThrows(
SendAlertException.class, () -> alertSender.send(expectedMessageBody, alertSubject));
assertTrue(actualErrorMessage.getMessage().contains(expectedErrorMessage));
}
When I run those tests separetly, they work correctly. But when I run suite, the second test running fails, because it is using arguments from the other test.
They're somehow sharing that resource, but I have no idea how. Any ideas?

Okey, I got it - deep in the class responsible for sending emails, I was creating session object by getDefaultInstance(). It was creating singleton which was available for all JVM processes. When I used getInstance() - it work like a charm.

Unable to use custom variables in Gradle extension

I'm using JIB (not super relevant) and I want to pass in variables from command line in my deployment script.
I append using -PinputTag=${DOCKER_TAG} -PbuildEnv=nonprod in my gradle command, which is cool. But when it's missing, I want that ternary to kick in.
I'm getting the error:
Could not get unknown property 'inputTag' for project ':webserver' of type org.gradle.api.Project.
def inputTag = inputTag ?: 'latest'
def buildEnv = buildEnv ?: 'nonprod'
jib {
container {
mainClass = 'com.example.hi'
}
to {
image = 'image/cool-image'
tags = ['latest', inputTag]
}
container {
creationTime = 'USE_CURRENT_TIMESTAMP'
ports = ['8080']
jvmFlags = ['-Dspring.profiles.active=' + buildEnv]
}
}
Found Solution
def inputTag = project.hasProperty('inputTag') ? project.property('inputTag') : 'latest'
def buildEnv = project.hasProperty('buildEnv') ? project.property('buildEnv') : 'nonprod'
This seems to be working, is this the best way?

How about this?
image = 'image/cool-image:' + (project.findProperty('inputTag') ?: 'latest')
Note jib.to.tags are additional tags. jib.to.image = 'image/cool-image' already implies image/cool-image:latest, so no need to duplicate latest in jib.to.tags.

NotSerializableException using Publish Over SSH in Jenkinsfile

I'm trying to use the Publish over SSH plugin inside a Jenkinsfile. However, I'm getting the exception java.io.NotSerializableException in the createClient method. This is my code:
def publish_ssh = Jenkins.getInstance().getDescriptor("jenkins.plugins.publish_over_ssh.BapSshPublisherPlugin")
def hostConfiguration = publish_ssh.getConfiguration("${env.DSV_DEPLOY_SERVER}");
if( hostConfiguration == null )
{
currentBuild.rawBuild.result = Result.ABORTED
throw new hudson.AbortException("Configuration for ${env.DSV_DEPLOY_SERVER} not found.")
}
def buildInfo = hostConfiguration.createDummyBuildInfo();
def sshClient = hostConfiguration.createClient( buildInfo, new BapSshTransfer(
env.SOURCE_FILE,
null,
env.DSV_DEPLOY_REMOTE_DIR,
env.REMOVE_PREFIX,
false,
false,
env.DSV_DEPLOY_COMMAND,
env.DSV_DEPLOY_TIMEOUT as Integer,
false,
false,
false,
null
));
How can I get rid of the exception?

It is because some variables are not serializable.
From doc
Since pipelines must survive Jenkins restarts, the state of the running program is periodically saved to disk so it can be resumed later (saves occur after every step or in the middle of steps such as sh).
You may use #NonCPS annotation to do the creation,
use the
#NonCPS
def createSSHClient() {
// your code here.
}

Ingest(Update hash Code) in echoprint servers using JAVA

I am developing an android application using JAVA. All I want is to
record a song and generate its hash(CODE), then query the echoprint server for a match.
If a match is not found, then upload it to the server (ingest) for future references.
I have been able to achieve the first part. Can someone suggest me about the second part in JAVA? (P.S. : I've seen how to do it using python codes - but that won't be helpful in my case.)
Another question, may I achieve the second objective with the global echoprint server? Or, do I need to set up one of my own?
The references I've used are:
http://masl.cis.gvsu.edu/2012/01/25/android-echoprint/
https://github.com/gvsumasl/EchoprintForAndroid

To insert a song into the echoprint server database, all you need to do is call the ingest method. Basically, it is only a HTTP POST request with correct json body. Here is a Scala code (Java would be very similar) that I am using for that:
import EasyJSON.JSON
import EasyJSON.ScalaJSON
import dispatch.Defaults.executor
import dispatch._
class EchoprintAPI {
val API_URL = "http://your.api.server"
def queryURL(code: String) = url(s"$API_URL/query?fp_code=$code")
def query(code: String): scala.concurrent.Future[ScalaJSON] = {
jsonResponse(queryURL(code))
}
def ingest(json: ScalaJSON, trackId: String): scala.concurrent.Future[ScalaJSON] = {
val metadata = json("metadata")
val request = url(s"$API_URL/ingest").POST
.addParameter("fp_code", json("code").toString)
.addParameter("artist", metadata("artist").toString)
.addParameter("release", metadata("release").toString)
.addParameter("track", metadata("title").toString)
.addParameter("codever", metadata("version").toString)
.addParameter("length", metadata("duration").toString)
.addParameter("genre", metadata("genre").toString)
.addParameter("bitrate", metadata("bitrate").toString)
.addParameter("source", metadata("filename").toString)
.addParameter("track_id", trackId)
.addParameter("sample_rate", metadata("sample_rate").toString)
jsonResponse(request)
}
def delete(trackId: String): scala.concurrent.Future[ScalaJSON] = {
jsonResponse(url(s"$API_URL/query?track_id=$trackId").DELETE)
}
protected def jsonResponse(request: dispatch.Req): scala.concurrent.Future[EasyJSON.ScalaJSON] = {
val response = Http(request OK as.String)
for (c <- response) yield JSON.parseJSON(c)
}
}
To generate the fingerprint code, you can use echoprint-codegen command line call or use the Java JNI integration with C lib

Test WebSocket in PlayFramework

I have a WebSocket in my Play application and I want to write a test for it, but I couldn't find any example on how to write such a test. I found a discussion in the play-framework Google group but there has been no activity recently.
So, are there any ideas on how to test WebSocket's in a Java test?

You can retrieve underlying Iteratee,Enumerator and test them directly. This way you don't need to use a browser. You need akka-testkit though, to cope with asynchronous nature of iteratees.
A Scala example:
object WebSocket extends Controller {
def websocket = WebSocket.async[JsValue] { request =>
Future.successful(Iteratee.ignore[JsValue] -> Enumerator.apply[JsValue](Json.obj("type" -> "error")))
}
}
class WebSocketSpec extends PlaySpecification {
"WebSocket" should {
"respond with error packet" in new WithApplication {
val request = FakeRequest()
var message: JsValue = null
val iteratee = Iteratee.foreach[JsValue](chunk => message = chunk)(Akka.system.dispatcher)
Controller.websocket().f(request)(Enumerator.empty[JsValue],iteratee)
TestKit.awaitCond(message == Json.obj("type" -> "error"), 1 second)
}
}
}

I test WebSockets code using Firefox:
https://github.com/schleichardt/stackoverflow-answers/commit/13d5876791ef409e092e4a097f54247d851e17dc#L8R14
For Java it works similar replacing 'HTMLUNIT' with 'FIREFOX': http://www.playframework.com/documentation/2.1.x/JavaFunctionalTest

Chrome provides a plugin to test websocket service.
Edit
So using the plugin (as shown in picture below) you can provide websocket url and the request data and send message to service. And message log shows the message sent from client and also service response.

Assume that you have a websocket library that returns the Future[Itearatee[JsValue, Unit], Enumerator[JsValue]] your controller uses
trait WSLib {
def connect: Future[Itearatee[JsValue, Unit], Enumerator[JsValue]]
}
And you wanna test this library.
Here is a context you can use:
trait WebSocketContext extends WithApplication {
val aSecond = FiniteDuration(1, TimeUnit.SECONDS)
case class Incoming(iteratee: Iteratee[JsValue, Unit]) {
def feed(message: JsValue) = {
iteratee.feed(Input.El(message))
}
def end(wait: Long = 100) = {
Thread.sleep(wait) //wait until all previous fed messages are handled
iteratee.feed(Input.EOF)
}
}
case class OutGoing(enum: Enumerator[JsValue]) {
val messages = enum(Iteratee.fold(List[JsValue]()) {
(l, jsValue) => jsValue :: l
}).flatMap(_.run)
def get: List[JsValue] = {
Await.result(messages, aSecond)
}
}
def wrapConnection(connection: => Future[Iteratee[JsValue, Unit], Enumerator[JsValue]]): (Incoming, OutGoing) = {
val (iteratee, enumerator) = Await.result(conn, aSecond)
(Incoming(iteratee), OutGoing(enumerator))
}
}
Then your tests can be written as
"return all subscribers when asked for info" in new WebSocketContext {
val (incoming, outgoing) = wrapConnection(myWSLib.connect)
incoming.feed(JsObject("message" => "hello"))
incoming.end() //this closes the connection
val responseMessages = outgoing.get //you only call this "get" after the connection is closed
responseMessages.size must equalTo(1)
responseMessages must contain(JsObject("reply" => "Hey"))
}
Incoming represent the messages coming from the client side, while the outgoing represents the messages sent from the server. To write test, you first feed in the incoming messages from incoming and then close the connection by calling incoming.end, then you get the complete list of outgoing messages from the outgoing.get method.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Running a Java-based Spark Job on spark-jobserver - java

Related

Why are JUnit5 Parametrized tests using argument providers not defined in #ArgumentSource?

Unable to use custom variables in Gradle extension

NotSerializableException using Publish Over SSH in Jenkinsfile

Ingest(Update hash Code) in echoprint servers using JAVA

Test WebSocket in PlayFramework

Categories

Resources