Connector task state fails to connect

Connector task state fails to connect - java

Task state of a connector is getting failed with error:
org.apache.kafka.connect.errors.ConnectException: java.lang.NoClassDefFoundError
I am running kafka connect cluster in distributed mode and I am using kafka(0.10.0.2.5) connect of ambari deployment.
I gave debezium mysql connector path using export CLASSPATH=/path to connector/.
Loaded connector configuration into Kafka Connect using the following command:
curl -i -X POST -H "Accept:application/json" \
-H "Content-Type:application/json" http://localhost:8083/connectors/ \
-d '{
"name": "MYSQL_CONNECTOR",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "10.224.21.36",
"database.port": "3306",
"database.user": "root",
"database.password": "shobhna",
"database.server.id": "1",
"database.server.name": "demo",
"database.history.kafka.bootstrap.servers": "slnxhadoop04.noid.in:6669",
"database.history.kafka.topic": "dbhistory.demo" ,
"include.schema.changes": "true"
}
}'
Now after checking connector status, I am getting error:
- {"name":"MYSQL_CONNECTOR","connector":{"state":"RUNNING","worker_id":"172.26.177.115:8083"},
"tasks":[{"state":"FAILED","trace":"org.apache.kafka.connect.errors.ConnectException:
java.lang.NoClassDefFoundError:
org/apache/kafka/clients/admin/AdminClient\n\tat
io.debezium.connector.mysql.MySqlConnectorTask.start(MySqlConnectorTask.java:218)\n\tat
io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:45)\n\tat
org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:137)\n\tat
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)\n\tat
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)\n\tat
java.util.concurrent.Executors$RunnableAdapter.cal(Executors.java:511)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n\tat
java.lang.Thread.run(Thread.java:745)\nCaused by:
java.lang.NoClassDefFoundError:
org/apache/kafka/clients/admin/AdminClient\n\tat
io.debezium.relational.history.KafkaDatabaseHistory.initializeStorage(KafkaDatabaseHistory.java:336)\n\tat
io.debezium.connector.mysql.MySqlSchema.intializeHistoryStorage(MySqlSchema.java:260)\n\tat
io.debezium.connector.mysql.MySqlTaskContext.initializeHistoryStorage(MySqlTaskContext.java:194)\n\tat
io.debezium.connector.mysql.MySqlConnectorTask.start(MySqlConnectorTask.java:126)\n\t...
9 more\nCaused by: java.lang.ClassNotFoundException:
org.apache.kafka.clients.admin.AdminClient \n\tat
java.net.URLClassLoader.findClass(URLClassLoader.java:381)\n\tat
java.lang.ClassLoader.loadClass(ClassLoader.java:424)\n\tat
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)\n\tat
java.lang.ClassLoader.loadClass(ClassLoader.java:357)\n\t

It can't find a builtin Kafka class, not your Connector
NoClassDefFoundError:
org/apache/kafka/clients/admin/AdminClient
...
i am using kafka(0.10.0.2.5)
Make sure you're running 1) a Connect Server version that matches your Kafka broker 2) using a Connector that uses code for that version of Connect
For example, AdminClient only exists in Kafka 0.11+.
In the recent HDP releases, you get Kafka 1.1 (different than 0.11), and this is the version that the latest Debezium is built and tested against https://debezium.io/docs/releases/
Debezium needs the AdminClient to make and register topic information, so I'm not sure if it'll work on old version such as 0.10
Its stated in the Kafka wiki that newer versions of Connect Server can communicate with old brokers, but the protocol used by the Connector classes is up for debate.

Related

Spring + Prometheus + Grafana: Err reading Prometheus: Post "http://localhost:9090/api/v1/query": dial tcp 127.0.0.1:9090: connect: connection refused

Hello I have an app in Spring Boot and I am exposing some metrics on Prometheus. My next goal is to provide these metrics on Grafana in order to obtain some beautiful dashboards. I am using docker on WSL Ubuntu and typed the next commands for Prometheus and Grafana:
docker run -d --name=prometheus -p 9090:9090 -v /mnt/d/Projects/Msc-Thesis-Project/prometheus.yml:/etc/prometheus/prometheus.yml prom/prometheus --config.file=/etc/prometheus/prometheus.yml
docker run -d --name=grafana -p 3000:3000 grafana/grafana
Below I am giving you the Prometheus dashboard in my browser and as you can see, everything is up and running. My problem is in Grafana configuration where I have to configure Prometheus as Data Source.
In the field URL I am providing the http://localhost:9090 but I am getting the following error:
Error reading Prometheus: Post "http://localhost:9090/api/v1/query": dial tcp 127.0.0.1:9090: connect: connection refused
I've searched everywhere and saw some workarounds that don't apply to me. To be specific I used the following: http://host.docker.internal:9090, http://server-ip:9090 and of course my system's IP address via the ipconfig command http://<ip_address>:9090. Nothing works!!!
I am not using docker-compose but just a prometheus.yml file which is as follows.
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
static_configs:
- targets: ['localhost:9090']
- job_name: 'Spring Boot Application input'
metrics_path: '/actuator/prometheus'
scrape_interval: 2s
scheme: http
static_configs:
- targets: ['192.168.1.233:8080']
labels:
application: "MSc Project Thesis"
Can you advise me something?

You can use the docker inspect command to find the IP address of the Prometheus container and then replace the localhost word with it.

I'll suggest you to use docker-compose, which better supports in DNS resolving and your issues of localhost will get resolved.

It works for https://stackoverflow.com/a/74061034/4841138
Also, if you deploy the stack by docker compose and all dockers are in same network, you can do that:
URL: http://prometheus:9090
In above, prometheus is the domain name of the prometheus docker, which can be resolved by all dockers within same network.

implementation of sink connector in java package

i had started zookeeper ,kafka server ,kafka producer and kafka consumer and i had put jdbc sql connector jar downloaded from confluent and put the jar in the path and i have mentioned plugin.path in connect-standalone properties.and i have run connect-standalone.bat ....\config\connect-standalone.properties ....\config\sink-quickstart-mysql.properties without any error but it has many warnings and it is not getting started,but my data is not getting reflected in tables.what i have missed?can u please help me out i have below warnings
org.reflections.ReflectionsException: could not get type for name io.netty.inter
nal.tcnative.SSLPrivateKeyMethod
at org.reflections.ReflectionUtils.forName(ReflectionUtils.java:312)
at org.reflections.Reflections.expandSuperTypes(Reflections.java:382)
at org.reflections.Reflections.<init>(Reflections.java:140)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader$Inte
rnalReflections.<init>(DelegatingClassLoader.java:433)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scan
PluginPath(DelegatingClassLoader.java:325)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.scan
UrlsAndAddPlugins(DelegatingClassLoader.java:261)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.init
PluginLoader(DelegatingClassLoader.java:209)
at org.apache.kafka.connect.runtime.isolation.DelegatingClassLoader.init
Loaders(DelegatingClassLoader.java:202)
at org.apache.kafka.connect.runtime.isolation.Plugins.<init>(Plugins.jav
a:60)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone
.java:79)
Caused by: java.lang.ClassNotFoundException: io.netty.internal.tcnative.SSLPriva
teKeyMethod
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
at org.reflections.ReflectionUtils.forName(ReflectionUtils.java:310)
... 9 more

No need to write a source connector yourself unless you need to connect kafka to some exotic data source. Popular tools like mysql are already quite well covered. There is already a "jdbc-connector" by confluent that does what you want.
https://docs.confluent.io/current/connect/kafka-connect-jdbc/index.html
You'll need a working kafka-connect installation and then you can "connect" your mysql tables to kafka with an HTTP POST to the kafka connect API. Just specify a comma-separate list of the tables you'd like to be used as sources in the tables.whitelist attribute. For example, something like this....
curl -X POST $KAFKA_CONNECT_API/connectors -H "Content-Type: application/json" -d '{
"name": "jdbc_source_mysql_01",
"config": {
"connector.class": "io.confluent.connect.jdbc.JdbcSourceConnector",
"connection.url": "jdbc:mysql://mysql:3306/test",
"connection.user": "connect_user",
"connection.password": "connect_password",
"topic.prefix": "mysql-01-",
"poll.interval.ms" : 3600000,
"table.whitelist" : "test.accounts",
"mode":"bulk"
}
}'

Can't connect to Spark running in Kubernetes from Java

I have installed Kuberenetes (minikube for Windows 10) and added Spark there using helm:
.\helm.exe install --name spark-test stable/spark
Then I exposed Spark master port 7077 using
.\kubectl.exe expose deployment spark-test-master --port=7070 --name=spark-master-ext --type=NodePort
For example, my UI runs on http://<MINIKUBE_IP>:31905/ and spark master is exposed to <MINIKUBE_IP>:32473. In order to check, I do:
.\minikube-windows-amd64.exe service spark-master-ext
But when I do in Java:
SparkConf conf = new SparkConf().setMaster("spark://192.168.1.168:32473").setAppName("Data Extractor");
I've got:
18/03/19 13:57:29 WARN AppClient$ClientEndpoint: Could not connect to 192.168.1.168:32473: akka.remote.EndpointAssociationException: Association failed with [akka.tcp://sparkMaster#192.168.1.168:32473]
18/03/19 13:57:29 WARN ReliableDeliverySupervisor: Association with remote system [akka.tcp://sparkMaster#192.168.1.168:32473] has failed, address is now gated for [5000] ms. Reason: [Association failed with [akka.tcp://sparkMaster#192.168.1.168:32473]] Caused by: [Connection refused: no further information: /192.168.1.168:32473]
18/03/19 13:57:29 WARN AppClient$ClientEndpoint: Failed to connect to master 192.168.1.168:32473
akka.actor.ActorNotFound: Actor not found for: ActorSelection[Anchor(akka.tcp://sparkMaster#192.168.1.168:32473/), Path(/user/Master)]
Any ideas, how to run Java Spark jobs on Spark running in Minikube?

It looks like Helm chart for Spark is really outdated (1.5.1), so I have installed 2.3.0 locally and it runs without any issues. Case closed, sorry :)

An exception occurred while running. null: InvocationTargetException: Connector configured to listen on port 8080 failed to start -> [Help 1]

I am using Spring Tool Suite. I check out demo projects in STS. It works fine without any problem before I installed updates in STS today or I installed Oracle SQL Developer recently.
The steps to reproduce my bug：
In STS, in "File" -> "New" -> "Import Spring Getting Started Content", then check out "Building a RESTful Web Service" this project.
https://spring.io/guides/gs/rest-service/ I go to my project folder, type 'mvnw spring-root:run' (I am using Windows). Then got following error.
I do not if this bug related to I installed two updated in STS today or I installed Oracle SQL Developer recently.
Here is the error:
[ERROR] Failed to execute goal org.springframework.boot:spring-boot-maven-plugin:1.5.6.RELEASE:run (default-cli) on project gs-rest-service: An exception occurred while running. null: InvocationTargetException: Connector configured to listen on port 8080 failed to start -> [Help 1]
Then, I checked out the solution in here:
https://stackoverflow.com/a/27416379/8229192
It works after I kill the task which uses the port 8080.
c:\>netstat -ano | find "8080"
TCP 0.0.0.0:8080 0.0.0.0:0 LISTENING 3088
TCP [::]:8080 [::]:0 LISTENING 3088
c:\>taskkill /F /PID 3088
SUCCESS: The process with PID 1196 has been terminated.
My questions are:
Why will I have port conflict? Because I installed Oracle SQL Developer? How can I know exactly which software is using port 8080 also?
I want to know if I kill the task (A) which uses that port 8080, will it cause an issue when this task (A) run later?
I have checked out other projects (like: https://spring.io/guides/gs/scheduling-tasks/). I did not need to kill the task which is also using port 8080, I just directly run "mvnw spring-boot:run". It works and it does not have port number 8080 conflict. Why? Why some have port 8080 conflict, why some are not? This is very confused me. Thanks.

Oracle-XE, OracleXETNSListener service, uses port 8080 to serve its
Application Express.
You kill OracleXETNSListener service, it has no problem at all because you
use SQL Developer not Application Express. Or you can disable its auto start configuration.
Spring Boot's project, serves web server, uses port 8080 by default, you can run with different port Spring Boot - how to configure port, and https://spring.io/guides/gs/scheduling-tasks is not web serving project so it doesn't use any port.
hope this can help you

Spark SASL not working on the emr with yarn

So first, I want to say the only thing I have seen address this issue is here: Spark 1.6.1 SASL. However, when adding the configuration for the spark and yarn authentication, it is still not working. Below is my configuration for spark using spark-submit on a yarn cluster on amazon's emr:
SparkConf sparkConf = new SparkConf().setAppName("secure-test");
sparkConf.set("spark.authenticate.enableSaslEncryption", "true");
sparkConf.set("spark.network.sasl.serverAlwaysEncrypt", "true");
sparkConf.set("spark.authenticate", "true");
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
sparkConf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator");
try {
sparkConf.registerKryoClasses(new Class<?>[]{
Class.forName("org.apache.hadoop.io.LongWritable"),
Class.forName("org.apache.hadoop.io.Text")
});
} catch (Exception e) {}
sparkContext = new JavaSparkContext(sparkConf);
sparkContext.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
sparkContext.hadoopConfiguration().set("fs.s3a.enableServerSideEncryption", "true");
sparkContext.hadoopConfiguration().set("spark.authenticate", "true");
Note, I added the spark.authenticate to the sparkContext's hadoop configuration in code instead of the core-site.xml (which I am assuming I can do that since other things work as well).
Looking here: https://github.com/apache/spark/blob/master/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java it seems like both spark.authenticate's are necessary. When I run this application, I get the following stack trace.
17/01/03 22:10:23 INFO storage.BlockManager: Registering executor with local external shuffle service.
17/01/03 22:10:23 ERROR client.TransportClientFactory: Exception while bootstrapping client after 178 ms
java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message type: -22
at org.apache.spark.network.shuffle.protocol.BlockTransferMessage$Decoder.fromByteBuffer(BlockTransferMessage.java:67)
at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:71)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)
In Spark's docs, it says
For Spark on YARN deployments, configuring spark.authenticate to true will automatically handle generating and distributing the shared secret. Each application will use a unique shared secret.
which seems wrong based on the comments in the yarn file above, but with trouble shooting, I am still lost on where I should go to get sasl to work? Am I missing something obvious that is documented somewhere?

So I finally figured it out. The previous StackOverflow thread was technically correct. I needed to add the spark.authenticate to the yarn configuration. Maybe it is possible to do this, but I can't figure out how to add this configuration in the code, which makes sense at a high level why this is the case. I will post my configuration below in case anyone else runs into this issue in the future.
First, I used an aws emr configurations file (An example of this is when using aws cli aws emr create-cluster --configurations file://youpathhere.json)
Then, I added the following json to the file:
[{
"Classification": "spark-defaults",
"Properties": {
"spark.authenticate": "true",
"spark.authenticate.enableSaslEncryption": "true",
"spark.network.sasl.serverAlwaysEncrypt": "true"
}
},
{
"Classification": "core-site",
"Properties": {
"spark.authenticate": "true"
}
}]

I got the same error message on Spark on Dataproc (Google Cloud Platform) after I added configuration options for Spark network encryption.
I initially created the Dataproc cluster with the following command.
gcloud dataproc clusters create test-encryption --no-address \
--service-account=<SERVICE-ACCOUNT> \
--zone=europe-west3-c --region=europe-west3 \
--subnet=<SUBNET> \
--properties 'spark:spark.authenticate=true,spark:spark.network.crypto.enabled=true'
The solution was to add in addition the configuration 'yarn:spark.authenticate=true'. A working Dataproc cluster with RPC encryption of Spark can therefore be created as follows.
gcloud dataproc clusters create test-encryption --no-address \
--service-account=<SERVICE-ACCOUNT> \
--zone=europe-west3-c --region=europe-west3 \
--subnet=<SUBNET> \
--properties 'spark:spark.authenticate=true,spark:spark.network.crypto.enabled=true,yarn:spark.authenticate=true'
I verified the encryption with ngrep. I installed ngrep as follows on the master node.
sudo apt-get update
sudo apt-get install ngrep
I then run ngrep on an arbitrary port 20001.
sudo ngrep port 20001
If you then run a Spark job with the following configuration properties you can see the encrypted communication between driver and worker nodes.
spark.driver.port=20001
spark.blockManager.port=20002
Note, I would always advice also to enable Kerberos on Dataproc to secure authentication for Hadoop, Yarn etc. This can be achieved with the flag --enable-kerberos in the cluster creation command.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.