Spark SASL not working on the emr with yarn - java

So first, I want to say the only thing I have seen address this issue is here: Spark 1.6.1 SASL. However, when adding the configuration for the spark and yarn authentication, it is still not working. Below is my configuration for spark using spark-submit on a yarn cluster on amazon's emr:
SparkConf sparkConf = new SparkConf().setAppName("secure-test");
sparkConf.set("spark.authenticate.enableSaslEncryption", "true");
sparkConf.set("spark.network.sasl.serverAlwaysEncrypt", "true");
sparkConf.set("spark.authenticate", "true");
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
sparkConf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator");
try {
sparkConf.registerKryoClasses(new Class<?>[]{
Class.forName("org.apache.hadoop.io.LongWritable"),
Class.forName("org.apache.hadoop.io.Text")
});
} catch (Exception e) {}
sparkContext = new JavaSparkContext(sparkConf);
sparkContext.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
sparkContext.hadoopConfiguration().set("fs.s3a.enableServerSideEncryption", "true");
sparkContext.hadoopConfiguration().set("spark.authenticate", "true");
Note, I added the spark.authenticate to the sparkContext's hadoop configuration in code instead of the core-site.xml (which I am assuming I can do that since other things work as well).
Looking here: https://github.com/apache/spark/blob/master/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java it seems like both spark.authenticate's are necessary. When I run this application, I get the following stack trace.
17/01/03 22:10:23 INFO storage.BlockManager: Registering executor with local external shuffle service.
17/01/03 22:10:23 ERROR client.TransportClientFactory: Exception while bootstrapping client after 178 ms
java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message type: -22
at org.apache.spark.network.shuffle.protocol.BlockTransferMessage$Decoder.fromByteBuffer(BlockTransferMessage.java:67)
at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:71)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)
In Spark's docs, it says
For Spark on YARN deployments, configuring spark.authenticate to true will automatically handle generating and distributing the shared secret. Each application will use a unique shared secret.
which seems wrong based on the comments in the yarn file above, but with trouble shooting, I am still lost on where I should go to get sasl to work? Am I missing something obvious that is documented somewhere?

So I finally figured it out. The previous StackOverflow thread was technically correct. I needed to add the spark.authenticate to the yarn configuration. Maybe it is possible to do this, but I can't figure out how to add this configuration in the code, which makes sense at a high level why this is the case. I will post my configuration below in case anyone else runs into this issue in the future.
First, I used an aws emr configurations file (An example of this is when using aws cli aws emr create-cluster --configurations file://youpathhere.json)
Then, I added the following json to the file:
[{
"Classification": "spark-defaults",
"Properties": {
"spark.authenticate": "true",
"spark.authenticate.enableSaslEncryption": "true",
"spark.network.sasl.serverAlwaysEncrypt": "true"
}
},
{
"Classification": "core-site",
"Properties": {
"spark.authenticate": "true"
}
}]

I got the same error message on Spark on Dataproc (Google Cloud Platform) after I added configuration options for Spark network encryption.
I initially created the Dataproc cluster with the following command.
gcloud dataproc clusters create test-encryption --no-address \
--service-account=<SERVICE-ACCOUNT> \
--zone=europe-west3-c --region=europe-west3 \
--subnet=<SUBNET> \
--properties 'spark:spark.authenticate=true,spark:spark.network.crypto.enabled=true'
The solution was to add in addition the configuration 'yarn:spark.authenticate=true'. A working Dataproc cluster with RPC encryption of Spark can therefore be created as follows.
gcloud dataproc clusters create test-encryption --no-address \
--service-account=<SERVICE-ACCOUNT> \
--zone=europe-west3-c --region=europe-west3 \
--subnet=<SUBNET> \
--properties 'spark:spark.authenticate=true,spark:spark.network.crypto.enabled=true,yarn:spark.authenticate=true'
I verified the encryption with ngrep. I installed ngrep as follows on the master node.
sudo apt-get update
sudo apt-get install ngrep
I then run ngrep on an arbitrary port 20001.
sudo ngrep port 20001
If you then run a Spark job with the following configuration properties you can see the encrypted communication between driver and worker nodes.
spark.driver.port=20001
spark.blockManager.port=20002
Note, I would always advice also to enable Kerberos on Dataproc to secure authentication for Hadoop, Yarn etc. This can be achieved with the flag --enable-kerberos in the cluster creation command.

Related

Uploading jar to Apache Livy interactive session

Using Amazon emr-5.30.1 with Livy 0.7 and Spark 2.4.5
We are willing to use Apache Livy as a REST Service for spark.
The mode we want to work with is session and not batch.
Trying to upload a jar to the session (by the formal API) using:
curl -X POST \
-d '{"conf": {"kind" : "spark","jars": "s3://cjspro-emr-data/spark-examples.jar"}}' \
-H "Content-Type: application/json" localhost:8998/sessions
Looking at the session logs gives the impression that the jar is not being uploaded.
Not to mention that code snippets that are using the requested jar not working.
Any help?
I am not sure if the jar reference from s3 will work or not but we did the same using bootstrap actions and updating the spark config.
Step 1: Create a bootstrap script and add the following code;
aws s3 cp s3://cjspro-emr-data/spark-examples.jar /home/hadoop/jars/
Step 2: While creating Livy session, set the following spark config using the conf key in Livy sessions API
'conf':{'spark.driver.extraClassPath':'/home/hadoop/jars/*,
'spark.executor.extraClassPath':'/home/hadoop/jars/*'}
Step 3: Send the jars to be added to the session using the jars key in Livy session API.
'jars':['local:/home/hadoop/spark-examples.jar']
So the final data to create a Livy session would look like;
{
'kind':'pyspark',
'conf':'above mentioned dict',
'jars':['local:/home/hadoop/spark-examples.jar'],
'executorCores':'',
'executorMemory':'',
.
.
.
}

Cannot connect to Wildfly in Dockerfile

I'm creating a custom Dockerfile with extensions for official keycloak docker image. I want to change web-context and add some custom providers.
Here's my Dockerfile:
FROM jboss/keycloak:7.0.0
COPY startup-config.cli /opt/jboss/tools/cli/startup-config.cli
RUN /opt/jboss/keycloak/bin/jboss-cli.sh --connect --controller=localhost:9990 --file="/opt/jboss/tools/cli/startup-config.cli"
ENV KEYCLOAK_USER=admin
ENV KEYCLOAK_PASSWORD=admin
and startup-config.cli file:
/subsystem=keycloak-server/:write-attribute(name=web-context,value="keycloak/auth")
/subsystem=keycloak-server/:add(name=providers,value="module:module:x.y.z.some-custom-provider")
Bu unfortunately I receive such error:
The controller is not available at localhost:9990: java.net.ConnectException: WFLYPRT0053: Could not connect to remote+http://localhost:9990. The connection failed: WFLYPRT0053: Could not connect to remote+http://localhost:9990. The connection failed: Connection refused
The command '/bin/sh -c /opt/jboss/keycloak/bin/jboss-cli.sh --connect --controller=localhost:9990 --file="/opt/jboss/tools/cli/startup-config.cli"' returned a non-zero code: 1
Is it a matter of invalid localhost? How should I refer to the management API?
Edit: I also tried with ENTRYPOINT instead of RUN, but the same error occurred during container initialization.
You are trying to have Wildfly load your custom config file at build-time here. The trouble is, that the Wildfly server is not running while the Dockerfile is building.
Wildfly actually already has you covered regarding automatically loading custom config, there is built in support for what you want to do. You simply need to put your config file in a "magic location" inside the image.
You need to drop your config file here:
/opt/jboss/startup-scripts/
So that your Dockerfile looks like this:
FROM jboss/keycloak:7.0.0
COPY startup-config.cli /opt/jboss/startup-scripts/startup-config.cli
ENV KEYCLOAK_USER=admin
ENV KEYCLOAK_PASSWORD=admin
Excerpt from the keycloak documentation:
Adding custom script using Dockerfile
A custom script can be added by
creating your own Dockerfile:
FROM keycloak
COPY custom-scripts/ /opt/jboss/startup-scripts/
Now you can simply start the image, and the built features in keycloak (Wildfly feature really) will go look for a config in that spedific directory, and then attempt to load it up.
Edit from comment with final solution:
While the original answer solved the issue with being able to pass configuration to the server at all, an issue remained with the content of the script. The following error was received when starting the container:
=========================================================================
Executing cli script: /opt/jboss/startup-scripts/startup-config.cli
No connection to the controller.
=========================================================================
The issue turned out to be in the startup-config.cli script, where the jboss command embed-server was missing, needed to initiate a connection to the jboss instance. Also missing was the closing stop-embedded-server command. More about configuring jboss in this manner in the docs here: CHAPTER 8. EMBEDDING A SERVER FOR OFFLINE CONFIGURATION
The final script:
embed-server --std-out=echo
/subsystem=keycloak-server/theme=defaults/:write-attribute(name=cacheThemes,value=false)
/subsystem=keycloak-server/theme=defaults/:write-attribute(name=cacheTemplates,value=false)
stop-embedded-server
WildFly management interfaces are not available when building the Docker image. Your only option is to start the CLI in embedded mode as discussed here Running CLI commands in WildFly Dockerfile.
A more advanced approach consists in using the S2I installation scripts to trigger CLI commands.

Connect to Kafka with SSL using KafkaIO on Google Dataflow

From a server, I was able to connect and get the data out from a remote kafka server topic which has SSL configured.
From GCP, How can I connect to a remote kafka server using Google Dataflow pipeline passing SSL truststore, keystore certificates locations and the Google service account json?
I am using Eclipse plugin for dataflow runner option.
If I point to certificate on GCS, It throws error when certs are pointed to Google storage bucket.
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
Caused by: org.apache.kafka.common.KafkaException:
java.io.FileNotFoundException:
gs:/bucket/folder/truststore-client.jks (No such file or directory)
Followed: Truststore and Google Cloud Dataflow
Updated code pointing SSL truststore, keystore location to local machine's /tmp directory certifcates in case KafkaIO needs to read from file path. It did not throw FileNotFoundError.
Tried running the server Java client code from the GCP account and also using Dataflow - Beam Java pipeline, I get following error.
ssl.truststore.location = <LOCAL MACHINE CERTICATE FILE PATH>
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
org.apache.kafka.common.utils.AppInfoParser$AppInfo <init>
INFO: Kafka version : 1.0.0
org.apache.kafka.common.utils.AppInfoParser$AppInfo <init>
INFO: Kafka commitId : aaa7af6d4a11b29d
org.apache.kafka.common.network.SslTransportLayer close
WARNING: Failed to send SSL Close message
java.io.IOException: Broken pipe
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:81)
at org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:153)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:205)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at
org.apache.kafka.common.utils.LogContext$KafkaLogger warn
WARNING: [Consumer clientId=consumer-1, groupId=test-group] Connection to node -2 terminated during authentication. This may indicate that authentication failed due to invalid credentials.
Any suggestions or examples appreciated.
Git clone or upload the Java Maven Project from local machine to GCP Cloud Shell home directory.
Compile the project using the Dataflow runner command on Cloud Shell terminal.
mvn -Pdataflow-runner compile exec:java \
-Dexec.mainClass=com.packagename.JavaClass \
-Dexec.args="--project=PROJECT_ID \
--stagingLocation=gs://BUCKET/PATH/ \
--tempLocation=gs://BUCKET/temp/ \
--output=gs://BUCKET/PATH/output \
--runner=DataflowRunner"
Make sure the runner is set to DataflowRunnner.class and you see the job on Dataflow Console when running it on cloud. DirectRunner executions will not show up on cloud dataflow console.
Place certificates in the resources folder within the Maven project and read files using ClassLoader.
ClassLoader classLoader = getClass().getClassLoader();
File file = new File(classLoader.getResource("keystore.jks").getFile());
resourcePath.put("keystore.jks",file.getAbsoluteFile().getPath());
Write a ConsumerFactoryFn() to copy over certificates in Dataflow's "/tmp/" directory as described in https://stackoverflow.com/a/53549757/4250322
Use KafkaIO with resource path properties.
Properties props = new Properties();
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, "/tmp/truststore.jks");
props.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, "/tmp/keystore.jks");
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, PASSWORD);
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, PASSWORD);
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, PASSWORD);
//other properties
...
PCollection<String> collection = p.apply(KafkaIO.<String, String>read()
.withBootstrapServers(BOOTSTRAP_SERVERS)
.withTopic(TOPIC)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(props)
.withConsumerFactoryFn(new ConsumerFactoryFn())
.withMaxNumRecords(50)
.withoutMetadata()
).apply(Values.<String>create());
// Apply Beam transformations and write to output.

Connector task state fails to connect

Task state of a connector is getting failed with error:
org.apache.kafka.connect.errors.ConnectException: java.lang.NoClassDefFoundError
I am running kafka connect cluster in distributed mode and I am using kafka(0.10.0.2.5) connect of ambari deployment.
I gave debezium mysql connector path using export CLASSPATH=/path to connector/.
Loaded connector configuration into Kafka Connect using the following command:
curl -i -X POST -H "Accept:application/json" \
-H "Content-Type:application/json" http://localhost:8083/connectors/ \
-d '{
"name": "MYSQL_CONNECTOR",
"config": {
"connector.class": "io.debezium.connector.mysql.MySqlConnector",
"database.hostname": "10.224.21.36",
"database.port": "3306",
"database.user": "root",
"database.password": "shobhna",
"database.server.id": "1",
"database.server.name": "demo",
"database.history.kafka.bootstrap.servers": "slnxhadoop04.noid.in:6669",
"database.history.kafka.topic": "dbhistory.demo" ,
"include.schema.changes": "true"
}
}'
Now after checking connector status, I am getting error:
- {"name":"MYSQL_CONNECTOR","connector":{"state":"RUNNING","worker_id":"172.26.177.115:8083"},
"tasks":[{"state":"FAILED","trace":"org.apache.kafka.connect.errors.ConnectException:
java.lang.NoClassDefFoundError:
org/apache/kafka/clients/admin/AdminClient\n\tat
io.debezium.connector.mysql.MySqlConnectorTask.start(MySqlConnectorTask.java:218)\n\tat
io.debezium.connector.common.BaseSourceTask.start(BaseSourceTask.java:45)\n\tat
org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:137)\n\tat
org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:140)\n\tat
org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:175)\n\tat
java.util.concurrent.Executors$RunnableAdapter.cal(Executors.java:511)\n\tat
java.util.concurrent.FutureTask.run(FutureTask.java:266)\n\tat
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)\n\tat
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)\n\tat
java.lang.Thread.run(Thread.java:745)\nCaused by:
java.lang.NoClassDefFoundError:
org/apache/kafka/clients/admin/AdminClient\n\tat
io.debezium.relational.history.KafkaDatabaseHistory.initializeStorage(KafkaDatabaseHistory.java:336)\n\tat
io.debezium.connector.mysql.MySqlSchema.intializeHistoryStorage(MySqlSchema.java:260)\n\tat
io.debezium.connector.mysql.MySqlTaskContext.initializeHistoryStorage(MySqlTaskContext.java:194)\n\tat
io.debezium.connector.mysql.MySqlConnectorTask.start(MySqlConnectorTask.java:126)\n\t...
9 more\nCaused by: java.lang.ClassNotFoundException:
org.apache.kafka.clients.admin.AdminClient \n\tat
java.net.URLClassLoader.findClass(URLClassLoader.java:381)\n\tat
java.lang.ClassLoader.loadClass(ClassLoader.java:424)\n\tat
sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)\n\tat
java.lang.ClassLoader.loadClass(ClassLoader.java:357)\n\t
It can't find a builtin Kafka class, not your Connector
NoClassDefFoundError:
org/apache/kafka/clients/admin/AdminClient
...
i am using kafka(0.10.0.2.5)
Make sure you're running 1) a Connect Server version that matches your Kafka broker 2) using a Connector that uses code for that version of Connect
For example, AdminClient only exists in Kafka 0.11+.
In the recent HDP releases, you get Kafka 1.1 (different than 0.11), and this is the version that the latest Debezium is built and tested against https://debezium.io/docs/releases/
Debezium needs the AdminClient to make and register topic information, so I'm not sure if it'll work on old version such as 0.10
Its stated in the Kafka wiki that newer versions of Connect Server can communicate with old brokers, but the protocol used by the Connector classes is up for debate.

Unable to pull JMX data using jolokia from Kafka

I have installed Jolokia in centos 7 machine and trying to pull Kafka metrics using Jolokia agent and integrate with Icinga monitoring tool using Nagios plugin check_jmx4perl. Below are the configuration steps I have followed
Step 1: Downloaded jolokia-jvm-1.3.4-agent.jar
Step 2: Copied to /home/usr/
Step 3: Provided permissions by issuing command chmod a+x /home/usr/jolokia-jvm-1.3.4.jar
Step 4: Added to class path by issuing command export KAFKA_OPTS="$KAFKA_OPTS -javaagent:/home/usr/jolokia-jvm-1.3.4-agent.jar=host=*"
Step 5: Started Zookeeper and Kafka in standalone mode and tried to fetch list of topics which works fine by displaying the message
INFO: No access restrictor found, access to all MBean is allowed
Jolokia: Agent started with URL http://0:0:0:0:0:0:0:0:8778/jolokia/
Step 6: Testing jolokia agent by issuing the command j4psh http://localhost:8778
Connection refused
I have also tried by providing IP address but the issue still remains the same. Do I need to make an entry of the host in etc/hosts file?
Not sure if you are same OP as this question, but:
Perhaps you need to fully qualify the path of the jar. Mine looks like this and works:
export JOLOKIA_HOME=/libs/java/jolokia/1.3.7
export JOLOKIA_JAR=$JOLOKIA_HOME/jolokia-jvm-1.3.7-agent.jar
export KAFKA_OPTS="-javaagent:$JOLOKIA_JAR=port=7778,host=* $KAFKA_OPTS"
When I start Kafka in non-daemon mode, it prints this:
I> No access restrictor found, access to any MBean is allowed
Jolokia: Agent started with URL http://10.8.36.121:7778/jolokia/
Then I point my browser to http://localhost:7778/jolokia/search/: and I get:
{
"request": {
"mbean": "*:*",
"type": "search"
},
"value": [
"kafka.network:name=ResponseQueueTimeMs,request=ListGroups,type=RequestMetrics",
"kafka.server:delayedOperation=topic,name=PurgatorySize,type=DelayedOperationPurgatory",
"kafka.server:delayedOperation=Fetch,name=NumDelayedOperations,type=DelayedOperationPurgatory",
"kafka.network:name=RemoteTimeMs,request=Heartbeat,type=RequestMetrics",
<-- SNIP -->
"kafka.network:name=LocalTimeMs,request=Offsets,type=RequestMetrics"
],
"timestamp": 1504188793,
"status": 200
}
j4psh also connects with:
j4psh http://localhost:7778/jolokia
Add to KAFKA_OPTS:
javaagent:/usr/share/java/kafka/jolokia-jvm-1.6.0-agent.jar -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false -Djava.rmi.server.hostname=localhost -Dcom.sun.management.jmxremote.rmi.port=9999 -Djava.security.auth.login.config=/var/private/sasl_acl/kafka.server.jaas.config

Categories

Resources