Error while Streaming data from twitter using Spark Streaming

Error while Streaming data from twitter using Spark Streaming - java

I am writing twitter connector to get data from twitter but i got following Exception while running.
i have create application for prints tweets and learn how to do spark streaming with twitter.
20/09/25 05:53:18 ERROR ReceiverTracker: Deregistered receiver for stream 0: Restarting receiver with delay 2000ms: Error starting Twitter stream - java.lang.IllegalStateException: Authentication credentials are missing. See http://twitter4j.org/en/configuration.html for details. See and register at http://apps.twitter.com/
at twitter4j.TwitterBaseImpl.ensureAuthorizationEnabled(TwitterBaseImpl.java:219)
at twitter4j.TwitterStreamImpl.sample(TwitterStreamImpl.java:161)
at org.apache.spark.streaming.twitter.TwitterReceiver.onStart(TwitterInputDStream.scala:93)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.startReceiver(ReceiverSupervisor.scala:149)
at org.apache.spark.streaming.receiver.ReceiverSupervisor.$anonfun$restartReceiver$1(ReceiverSupervisor.scala:198)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Below is the code for this Application
package SparkStreaming
import org.apache.log4j.{Level, Logger}
import org.apache.spark.streaming.twitter.TwitterUtils
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.io.Source
object Tweets {
Logger.getLogger("org").setLevel(Level.ERROR)
def main(args: Array[String]): Unit = {
setTweeter()
val ssc = new StreamingContext("local[*]","Tweets", Seconds(3))
val tweets = TwitterUtils.createStream(ssc,None)
val statuses = tweets.map(status => status.getText)
statuses.print()
ssc.start()
ssc.awaitTermination()
}
def setTweeter() : Unit = {
for ( line <- Source.fromFile("src/data/tweeter.txt").getLines())
{
val fields = line.split(" ")
if(fields.length == 2)
{
System.setProperty("tweeter4j.oauth." + fields(0), fields(1))
}
}
}
}
Can anyone Assist me to Resolve this problem ???

Related

Access Denied for ElasticBeanstalk DescribeConfigurationSettings API method

I tried to run the DescribeConfigurationSettings API method for the ElasticBeanstalk as follow:
AWSElasticBeanstalk ebs = AWSElasticBeanstalkClientBuilder.standard().withRegion(Regions.EU_CENTRAL_1).withCredentials(new AWSStaticCredentialsProvider(credentials)).build();
for(ApplicationDescription ad : ebs.describeApplications().getApplications()){
System.out.println(ad);
for(EnvironmentDescription ed : ebs.describeEnvironments(new DescribeEnvironmentsRequest().withApplicationName(ad.getApplicationName())).getEnvironments()) {
System.out.println(ebs.describeConfigurationSettings(new DescribeConfigurationSettingsRequest().withApplicationName(ad.getApplicationName()).withEnvironmentName(ed.getEnvironmentName())).getConfigurationSettings());
}
}
However, I got the exception of Access Denied with the following message:
Exception in thread "main"
com.amazonaws.services.elasticbeanstalk.model.AWSElasticBeanstalkException:
Access Denied: S3Bucket=elasticbeanstalk-env-resources-eu-central-1,
S3Key=eb_patching_resources/instance_patch_extension.linux (Service:
Amazon S3; Status Code: 403; Error Code: AccessDenied; Request ID:
NB44V0RXQG2WHH4T; Proxy: null) (Service: AWSElasticBeanstalk; Status
Code: 400; Error Code: InvalidParameterValue; Request ID:
b058aa54-fc9c-4879-9502-5cb5818bc64a; Proxy: null)
How can I resolve this issue?

Based on the error you get, it seems that you are missing some IAM permissions. I would start by adding AWSElasticBeanstalkManagedUpdatesCustomerRolePolicy Managed policy to your user.
This policy is probably more permissive than what you actually need, but it would be difficult to pinpoint exactly, which permissions are necessary.

Amazon recommends using AWS SDK for Java V2.
Updated Code
Here is the Java V2 code for this use case.
package com.aws.example;
import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.elasticbeanstalk.ElasticBeanstalkClient;
import software.amazon.awssdk.services.elasticbeanstalk.model.*;
import java.util.List;
public class DescribeApplications {
public static void main(String[] args) {
Region region = Region.US_EAST_1;
ElasticBeanstalkClient beanstalkClient = ElasticBeanstalkClient.builder()
.region(region)
.build();
DescribeApplicationsResponse applicationsResponse = beanstalkClient.describeApplications();
List<ApplicationDescription> apps = applicationsResponse.applications();
for (ApplicationDescription app: apps) {
System.out.println("The application name is "+app.applicationName());
DescribeEnvironmentsRequest desRequest = DescribeEnvironmentsRequest.builder()
.applicationName(app.applicationName())
.build();
DescribeEnvironmentsResponse res = beanstalkClient.describeEnvironments(desRequest) ;
List<EnvironmentDescription> envDesc = res.environments();
for (EnvironmentDescription desc: envDesc) {
System.out.println("The Environment ARN is "+desc.environmentArn());
}
}
}
}
Output here:

Why blocking on thenApplyAsync works but not with thenApply

We saw some interesting behavior in our application. The following Spock spec captures the behavior. I am trying to understand why the second test passes but the first one throws a TimeoutException.
Summary:
There is a mock server with a mock endpoint that responds with a success after a 10ms delay.
We use AsyncHttpClient to make a nonblocking call to this mock endpoint. The first call is chained with a second blocking call to the same endpoint. The first call succeeds but the second fails with timeout if thenApply is used but succeeds if thenApplyAsync is used. In both cases, the mock server seems to respond within 10ms.
Dependencies:
implementation 'com.google.guava:guava:29.0-jre'
implementation 'org.asynchttpclient:async-http-client:2.12.1'
// Use the latest Groovy version for Spock testing
testImplementation 'org.codehaus.groovy:groovy-all:2.5.11'
// Use the awesome Spock testing and specification framework even with Java
testImplementation 'org.spockframework:spock-core:1.3-groovy-2.5'
testImplementation 'org.objenesis:objenesis:1.4'
testImplementation "cglib:cglib:2.2"
testImplementation 'junit:junit:4.13'
testImplementation 'org.mock-server:mockserver-netty:5.11.1'
Spock Spec:
package com.switchcase.asyncthroughput
import com.google.common.base.Charsets
import org.asynchttpclient.DefaultAsyncHttpClient
import org.asynchttpclient.RequestBuilder
import org.mockserver.integration.ClientAndServer
import org.mockserver.model.HttpResponse
import spock.lang.Shared
import spock.lang.Specification
import java.util.concurrent.CompletableFuture
import java.util.concurrent.CompletionException
import java.util.concurrent.ExecutorService
import java.util.concurrent.Executors
import java.util.concurrent.TimeUnit
import java.util.concurrent.TimeoutException
import static org.mockserver.integration.ClientAndServer.startClientAndServer
import static org.mockserver.model.HttpRequest.request
class CompletableFutureThreadsTest extends Specification {
#Shared
ClientAndServer mockServer
def asyncHttpClient = new DefaultAsyncHttpClient();
def setupSpec() {
mockServer = startClientAndServer(9192);
//create a mock server which response with "done" after 100ms.
mockServer.when(request()
.withMethod("POST")
.withPath("/validate"))
.respond(HttpResponse.response().withBody("done")
.withStatusCode(200)
.withDelay(TimeUnit.MILLISECONDS, 10));
}
def "Calls external using AHC with a blocking call with 1sec timeout results in TimeoutException."() {
when:
callExternal().thenApply({ resp -> callExternalBlocking() }).join()
then:
def exception = thrown(CompletionException)
exception instanceof CompletionException
exception.getCause() instanceof TimeoutException
exception.printStackTrace()
}
def "Calls external using AHC with a blocking call on ForkJoinPool with 1sec timeout results in success."() {
when:
def value = callExternal().thenApplyAsync({ resp -> callExternalBlocking() }).join()
then:
value == "done"
}
def cleanupSpec() {
mockServer.stop(true)
}
private CompletableFuture<String> callExternal(def timeout = 1000) {
RequestBuilder requestBuilder = RequestBuilder.newInstance();
requestBuilder.setMethod("POST").setUrl("http://localhost:9192/validate").setRequestTimeout(timeout)
def cf = asyncHttpClient.executeRequest(requestBuilder).toCompletableFuture()
return cf.thenApply({ response ->
println("CallExternal Succeeded.")
return response.getResponseBody(Charsets.UTF_8)
})
}
private String callExternalBlocking(def timeout = 1000) {
RequestBuilder requestBuilder = RequestBuilder.newInstance();
requestBuilder.setMethod("POST").setUrl("http://localhost:9192/validate").setRequestTimeout(timeout)
def cf = asyncHttpClient.executeRequest(requestBuilder).toCompletableFuture()
return cf.thenApply({ response ->
println("CallExternalBlocking Succeeded.")
return response.getResponseBody(Charsets.UTF_8)
}).join()
}
}
EDIT:
Debug log and stack trace for timeout: (The timeout happens on the remote call in callExternalBlocking)
17:37:38.885 [AsyncHttpClient-timer-2-1] DEBUG org.asynchttpclient.netty.timeout.TimeoutTimerTask - Request timeout to localhost/127.0.0.1:9192 after 1000 ms for NettyResponseFuture{currentRetry=0,
isDone=0,
isCancelled=0,
asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase#478251c9,
nettyRequest=org.asynchttpclient.netty.request.NettyRequest#4945b749,
future=java.util.concurrent.CompletableFuture#4d7a3ab9[Not completed, 1 dependents],
uri=http://localhost:9192/validate,
keepAlive=true,
redirectCount=0,
timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder#878bd72,
inAuth=0,
touch=1622248657866} after 1019 ms
17:37:38.886 [AsyncHttpClient-timer-2-1] DEBUG org.asynchttpclient.netty.channel.ChannelManager - Closing Channel [id: 0x5485056c, L:/127.0.0.1:58076 - R:localhost/127.0.0.1:9192]
17:37:38.886 [AsyncHttpClient-timer-2-1] DEBUG org.asynchttpclient.netty.request.NettyRequestSender - Aborting Future NettyResponseFuture{currentRetry=0,
isDone=0,
isCancelled=0,
asyncHandler=org.asynchttpclient.AsyncCompletionHandlerBase#478251c9,
nettyRequest=org.asynchttpclient.netty.request.NettyRequest#4945b749,
future=java.util.concurrent.CompletableFuture#4d7a3ab9[Not completed, 1 dependents],
uri=http://localhost:9192/validate,
keepAlive=true,
redirectCount=0,
timeoutsHolder=org.asynchttpclient.netty.timeout.TimeoutsHolder#878bd72,
inAuth=0,
touch=1622248657866}
java.util.concurrent.CompletionException: java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:9192 after 1000 ms
at java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:292)
at java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:308)
at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:607)
at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:591)
at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488)
at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990)
at org.asynchttpclient.netty.NettyResponseFuture.abort(NettyResponseFuture.java:273)
at org.asynchttpclient.netty.request.NettyRequestSender.abort(NettyRequestSender.java:473)
at org.asynchttpclient.netty.timeout.TimeoutTimerTask.expire(TimeoutTimerTask.java:43)
at org.asynchttpclient.netty.timeout.RequestTimeoutTimerTask.run(RequestTimeoutTimerTask.java:50)
at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:672)
at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:747)
at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:472)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.TimeoutException: Request timeout to localhost/127.0.0.1:9192 after 1000 ms
... 7 more

How to use an Event Hub Trigger in a Java Azure Function

I am attempting to create a Java Azure Function that triggers off of an Azure Event Hub. I am following these code snippets: https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-event-hubs-trigger?tabs=java#example
Here is my code:
package com.function;
import com.microsoft.azure.functions.*;
import com.microsoft.azure.function.annotation.*;
import java.util.Optional;
public class function {
#FunctionName("MTI")
public void EventHubProcess(
#EventHubTrigger(name = "msg", eventHubName = "mticallhub", connection = "EHubConnectionString"), String message, final ExecutionContext context)
{
context.getLogger().info("Java HTTP trigger processed a request: " + message);
}
}
Here is the error I get when building:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project func-MTI-test-zyg-001: Compilation failure
[ERROR] /C:/JavaStuff/FunctionApps/func-MTI-test-zyg-001/src/main/java/com/function/Function.java:[34,105] illegal start of type
Here is the error popup in VSCode:
I've been looking for hours and exhausted page after page of Google searches. What am I doing wrong?

Please change
#EventHubTrigger(name = "msg", eventHubName = "mticallhub", connection = "EHubConnectionString"), String message, final ExecutionContext context)
to
#EventHubTrigger(name = "msg", eventHubName = "mticallhub", connection = "EHubConnectionString") String message, final ExecutionContext context)

How to resolve current committed offsets differing from current available offsets?

I am attempting to read avro data from Kafka using Spark Streaming but I receive the following error message:
Streaming Query Exception caught!: org.apache.spark.sql.streaming.StreamingQueryException: Job aborted.
=== Streaming Query ===
Identifier: [id = 8b54c92d-6bbc-4dbc-84d0-55b762c21ba2, runId = 4bc92b3c-343e-4886-b0bc-0777b89f9ec8]
Current Committed Offsets: {KafkaV2[Subscribe[customer-avro4]]: {"customer-avro":{"0":17}}}
Current Available Offsets: {KafkaV2[Subscribe[customer-avro4]]: {"customer-avro":{"0":20}}}
Current State: ACTIVE
Thread State: RUNNABLE
Any idea on what the issue might be and how to resolve it? Code is the following (inspired from xebia-france spark-structured-streaming-blog). Actually, I think it ran earlier already but now there is a problem.
import com.databricks.spark.avro.SchemaConverters
import io.confluent.kafka.schemaregistry.client.{CachedSchemaRegistryClient, SchemaRegistryClient}
import io.confluent.kafka.serializers.AbstractKafkaAvroDeserializer
import org.apache.avro.Schema
import org.apache.avro.generic.GenericRecord
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.streaming.StreamingQueryException
object AvroConsumer {
private val topic = "customer-avro4"
private val kafkaUrl = "http://localhost:9092"
private val schemaRegistryUrl = "http://localhost:8081"
private val schemaRegistryClient = new CachedSchemaRegistryClient(schemaRegistryUrl, 128)
private val kafkaAvroDeserializer = new AvroDeserializer(schemaRegistryClient)
private val avroSchema = schemaRegistryClient.getLatestSchemaMetadata(topic + "-value").getSchema
private val sparkSchema = SchemaConverters.toSqlType(new Schema.Parser().parse(avroSchema))
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder
.appName("ConfluentConsumer")
.master("local[*]")
.getOrCreate()
spark.sparkContext.setLogLevel("ERROR")
spark.udf.register("deserialize", (bytes: Array[Byte]) =>
DeserializerWrapper.deserializer.deserialize(bytes)
)
val kafkaDataFrame = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", kafkaUrl)
.option("subscribe", topic)
.load()
val valueDataFrame = kafkaDataFrame.selectExpr("""deserialize(value) AS message""")
import org.apache.spark.sql.functions._
val formattedDataFrame = valueDataFrame.select(
from_json(col("message"), sparkSchema.dataType).alias("parsed_value"))
.select("parsed_value.*")
val writer = formattedDataFrame
.writeStream
.format("parquet")
.option("checkpointLocation", "hdfs://localhost:9000/data/spark/parquet/checkpoint")
while (true) {
val query = writer.start("hdfs://localhost:9000/data/spark/parquet/total")
try {
query.awaitTermination()
}
catch {
case e: StreamingQueryException => println("Streaming Query Exception caught!: " + e);
}
}
}
object DeserializerWrapper {
val deserializer: AvroDeserializer = kafkaAvroDeserializer
}
class AvroDeserializer extends AbstractKafkaAvroDeserializer {
def this(client: SchemaRegistryClient) {
this()
this.schemaRegistry = client
}
override def deserialize(bytes: Array[Byte]): String = {
val genericRecord = super.deserialize(bytes).asInstanceOf[GenericRecord]
genericRecord.toString
}
}
}

Figured it out - the problem was not as I had thought with the Spark-Kafka integration directly, but with the checkpoint information inside the hdfs filesystem instead. Deleting and recreating the checkpoint folder in hdfs solved it for me.

No such file or class on classpath (scala)

//package com.examples
/**
* Created by kalit_000 on 27/09/2015.
*/
import org.apache.spark.SparkConf
import org.apache.log4j.Logger
import org.apache.log4j.Level
import org.apache.spark._
import java.sql.{ResultSet, DriverManager, Connection}
import kafka.producer.KeyedMessage
import kafka.producer.Producer
import kafka.producer.ProducerConfig
import java.util.Properties
import org.apache.spark.streaming.{Seconds,StreamingContext}
import org.apache.spark._
object SqlServerKafkaProducer {
def main(args: Array[String]): Unit =
{
Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)
val conf = new SparkConf().setMaster("local[2]").setAppName("MSSQL_KAFKA_PRODUCER")
val sc=new SparkContext(conf)
val driver = "com.microsoft.sqlserver.jdbc.SQLServerDriver"
val url = "jdbc:sqlserver://localhost;user=admin;password=oracle;database=AdventureWorks2014"
val username = "admin"
val password = "oracle"
var connection: Connection = null
Class.forName(driver)
/*Create connection and statement to run against sql server and execute*/
connection = DriverManager.getConnection(url, username, password)
val statement = connection.createStatement()
val resultSet = statement.executeQuery("select top 10 CustomerID,StoreID,TerritoryID,AccountNumber from AdventureWorks2014.dbo.Customer")
resultSet.setFetchSize(10);
val columnnumber = resultSet.getMetaData().getColumnCount.toInt
/*OP COLUMN NAMES*/
var i = 0.toInt;
for (i <- 1 to columnnumber.toInt)
{
val columnname=resultSet.getMetaData().getColumnName(i)
println("Column Names are:- %s".format(columnname))
}
/*OP DATA*/
while (resultSet.next())
{
var list = new java.util.ArrayList[String]()
for (i <- 1 to columnnumber.toInt)
{
list.add(resultSet.getObject(i).toString())
//println("Column Names are:- %s".format(columnname))
}
println(list)
/*Buils kafka properties file*/
val props:Properties = new Properties()
props.put("metadata.broker.list", "localhost:9092")
props.put("serializer.class", "kafka.serializer.StringEncoder")
/*send message using kafka producer.send to topic trade*/
val config= new ProducerConfig(props)
val producer= new Producer[String,String](config)
//val x=list.collect().mkString("\n").replace("[","").replace("]","").replace(",","~")
producer.send(new KeyedMessage[String, String]("trade", list.toString().replace("[","").replace("]","").replace(",","~")))
}
/*close SQL Server database connection*/
connection.close()
}
}
I built jar using Maven in intellijIDEA this is scala spark project the jar file is created using under folder (C:\Users\kalit_000\IdeaProjects\SparkCookBook\target\SparkCookBook-0.0.1-SNAPSHOT-jar-with-dependencies.jar) when I tried to run the jar file using command
scala -classpath "C:\Users\kalit_000\IdeaProjects\SparkCookBook\target\SparkCookBook-1.0-SNAPSHOT-jar-with-dependencies.jar" SqlServerKafkaProducer
I am getting error which says
Error:-
No such file or class on classpath: SqlServerKafkaProducer.class
I can see my class inside the jar file I used Java decompiler software to open up Jar file.
can anyone help ?
I am able to compile in intellij Idea successfully.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Error while Streaming data from twitter using Spark Streaming - java

Related

Access Denied for ElasticBeanstalk DescribeConfigurationSettings API method

Why blocking on thenApplyAsync works but not with thenApply

How to use an Event Hub Trigger in a Java Azure Function

How to resolve current committed offsets differing from current available offsets?

No such file or class on classpath (scala)

Categories

Resources