From a server, I was able to connect and get the data out from a remote kafka server topic which has SSL configured.
From GCP, How can I connect to a remote kafka server using Google Dataflow pipeline passing SSL truststore, keystore certificates locations and the Google service account json?
I am using Eclipse plugin for dataflow runner option.
If I point to certificate on GCS, It throws error when certs are pointed to Google storage bucket.
Exception in thread "main" org.apache.beam.sdk.Pipeline$PipelineExecutionException: org.apache.kafka.common.KafkaException: Failed to construct kafka consumer
Caused by: org.apache.kafka.common.KafkaException:
java.io.FileNotFoundException:
gs:/bucket/folder/truststore-client.jks (No such file or directory)
Followed: Truststore and Google Cloud Dataflow
Updated code pointing SSL truststore, keystore location to local machine's /tmp directory certifcates in case KafkaIO needs to read from file path. It did not throw FileNotFoundError.
Tried running the server Java client code from the GCP account and also using Dataflow - Beam Java pipeline, I get following error.
ssl.truststore.location = <LOCAL MACHINE CERTICATE FILE PATH>
ssl.truststore.password = [hidden]
ssl.truststore.type = JKS
value.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
org.apache.kafka.common.utils.AppInfoParser$AppInfo <init>
INFO: Kafka version : 1.0.0
org.apache.kafka.common.utils.AppInfoParser$AppInfo <init>
INFO: Kafka commitId : aaa7af6d4a11b29d
org.apache.kafka.common.network.SslTransportLayer close
WARNING: Failed to send SSL Close message
java.io.IOException: Broken pipe
org.apache.beam.runners.direct.RootProviderRegistry.getInitialInputs(RootProviderRegistry.java:81)
at org.apache.beam.runners.direct.ExecutorServiceParallelExecutor.start(ExecutorServiceParallelExecutor.java:153)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:205)
at org.apache.beam.runners.direct.DirectRunner.run(DirectRunner.java:66)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:311)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:297)
at
org.apache.kafka.common.utils.LogContext$KafkaLogger warn
WARNING: [Consumer clientId=consumer-1, groupId=test-group] Connection to node -2 terminated during authentication. This may indicate that authentication failed due to invalid credentials.
Any suggestions or examples appreciated.
Git clone or upload the Java Maven Project from local machine to GCP Cloud Shell home directory.
Compile the project using the Dataflow runner command on Cloud Shell terminal.
mvn -Pdataflow-runner compile exec:java \
-Dexec.mainClass=com.packagename.JavaClass \
-Dexec.args="--project=PROJECT_ID \
--stagingLocation=gs://BUCKET/PATH/ \
--tempLocation=gs://BUCKET/temp/ \
--output=gs://BUCKET/PATH/output \
--runner=DataflowRunner"
Make sure the runner is set to DataflowRunnner.class and you see the job on Dataflow Console when running it on cloud. DirectRunner executions will not show up on cloud dataflow console.
Place certificates in the resources folder within the Maven project and read files using ClassLoader.
ClassLoader classLoader = getClass().getClassLoader();
File file = new File(classLoader.getResource("keystore.jks").getFile());
resourcePath.put("keystore.jks",file.getAbsoluteFile().getPath());
Write a ConsumerFactoryFn() to copy over certificates in Dataflow's "/tmp/" directory as described in https://stackoverflow.com/a/53549757/4250322
Use KafkaIO with resource path properties.
Properties props = new Properties();
props.put(CommonClientConfigs.SECURITY_PROTOCOL_CONFIG, "SSL");
props.put(SslConfigs.SSL_TRUSTSTORE_LOCATION_CONFIG, "/tmp/truststore.jks");
props.put(SslConfigs.SSL_KEYSTORE_LOCATION_CONFIG, "/tmp/keystore.jks");
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, PASSWORD);
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, PASSWORD);
props.put(SslConfigs.SSL_TRUSTSTORE_PASSWORD_CONFIG, PASSWORD);
//other properties
...
PCollection<String> collection = p.apply(KafkaIO.<String, String>read()
.withBootstrapServers(BOOTSTRAP_SERVERS)
.withTopic(TOPIC)
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(props)
.withConsumerFactoryFn(new ConsumerFactoryFn())
.withMaxNumRecords(50)
.withoutMetadata()
).apply(Values.<String>create());
// Apply Beam transformations and write to output.
Related
I have a jenkins master running in VM and docker running on another VM, I have setup all the required things for making docker as a build-agent for jenkins-master.
Now When I am running simple job on docker-build agent, I am getting following error.
SSHLauncher{host='192.168.33.15', port=49162, credentialsId='dock-cont-pass', jvmOptions='', javaPath='', prefixStartSlaveCmd='', suffixStartSlaveCmd='', launchTimeoutSeconds=60, maxNumRetries=10, retryWaitTime=15, sshHostKeyVerificationStrategy=hudson.plugins.sshslaves.verifiers.NonVerifyingKeyVerificationStrategy, tcpNoDelay=true, trackCredentials=true}
[08/02/21 11:00:36] [SSH] Opening SSH connection to 192.168.33.15:49162.
[08/02/21 11:00:36] [SSH] WARNING: SSH Host Keys are not being verified. Man-in-the-middle attacks may be possible against this connection.
[08/02/21 11:00:37] [SSH] Authentication successful.
[08/02/21 11:00:37] [SSH] The remote user's environment is:
BASH=/usr/bin/bash
BASHOPTS=checkwinsize:cmdhist:complete_fullquote:extquote:force_fignore:globasciiranges:hostcomplete:interactive_comments:progcomp:promptvars:sourcepath
BASH_ALIASES=()
BASH_ARGC=([0]="0")
BASH_ARGV=()
BASH_CMDS=()
BASH_EXECUTION_STRING=set
BASH_LINENO=()
BASH_SOURCE=()
BASH_VERSINFO=([0]="5" [1]="0" [2]="17" [3]="1" [4]="release" [5]="x86_64-pc-linux-gnu")
BASH_VERSION='5.0.17(1)-release'
DIRSTACK=()
EUID=1000
GROUPS=()
HOME=/home/jenkins
HOSTNAME=7bf4435f24c4
HOSTTYPE=x86_64
IFS=$' \t\n'
LOGNAME=jenkins
MACHTYPE=x86_64-pc-linux-gnu
MOTD_SHOWN=pam
OPTERR=1
OPTIND=1
OSTYPE=linux-gnu
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
PIPESTATUS=([0]="0")
PPID=72
PS4='+ '
PWD=/home/jenkins
SHELL=/bin/bash
SHELLOPTS=braceexpand:hashall:interactive-comments
SHLVL=1
SSH_CLIENT='192.168.33.10 37292 22'
SSH_CONNECTION='192.168.33.10 37292 172.17.0.3 22'
TERM=dumb
UID=1000
USER=jenkins
_=']'
Checking Java version in the PATH
openjdk version "1.8.0_292"
OpenJDK Runtime Environment (build 1.8.0_292-8u292-b10-0ubuntu1~20.04-b10)
OpenJDK 64-Bit Server VM (build 25.292-b10, mixed mode)
[08/02/21 11:00:37] [SSH] Checking java version of /jdk/bin/java
Couldn't figure out the Java version of /jdk/bin/java
bash: /jdk/bin/java: No such file or directory
[08/02/21 11:00:37] [SSH] Checking java version of java
[08/02/21 11:00:37] [SSH] java -version returned 1.8.0_292.
[08/02/21 11:00:37] [SSH] Starting sftp client.
ERROR: [08/02/21 11:00:37] [SSH] SFTP failed. Copying via SCP.
java.io.IOException: The subsystem request failed.
at com.trilead.ssh2.channel.ChannelManager.requestSubSystem(ChannelManager.java:741)
at com.trilead.ssh2.Session.startSubSystem(Session.java:362)
at com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:100)
at com.trilead.ssh2.SFTPv3Client.<init>(SFTPv3Client.java:119)
at com.trilead.ssh2.jenkins.SFTPClient.<init>(SFTPClient.java:43)
at hudson.plugins.sshslaves.SSHLauncher.copyAgentJar(SSHLauncher.java:688)
at hudson.plugins.sshslaves.SSHLauncher.access$400(SSHLauncher.java:111)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:456)
at hudson.plugins.sshslaves.SSHLauncher$1.call(SSHLauncher.java:421)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: The server denied the request.
at com.trilead.ssh2.channel.ChannelManager.requestSubSystem(ChannelManager.java:737)
... 12 more
[08/02/21 11:00:37] [SSH] Copying latest remoting.jar...
Expanded the channel window size to 4MB
[08/02/21 11:00:37] [SSH] **Starting agent process: cd "" && java -jar remoting.jar -workDir -jar-cache /remoting/jarCache
No argument is allowed: /remoting/jarCache**
java -jar agent.jar [options...]
-agentLog FILE : Local agent error log destination (overrides
workDir)
-auth user:pass : If your Jenkins is security-enabled, specify
a valid user name and password.
-cert VAL : Specify additional X.509 encoded PEM
certificates to trust when connecting to
Jenkins root URLs. If starting with # then
the remainder is assumed to be the name of
the certificate file to read.
-connectTo HOST:PORT : make a TCP connection to the given host and
port, then start communication.
-cp (-classpath) PATH : add the given classpath elements to the
system classloader.
-failIfWorkDirIsMissing : Fails the initialization if the requested
workDir or internalDir are missing ('false'
by default) (default: false)
-help : Show this help message (default: false)
-internalDir VAL : Specifies a name of the internal files
within a working directory ('remoting' by
default) (default: remoting)
-jar-cache DIR : Cache directory that stores jar files sent
from the master
-jnlpCredentials USER:PASSWORD : HTTP BASIC AUTH header to pass in for making
HTTP requests.
-jnlpUrl URL : instead of talking to the master via
stdin/stdout, emulate a JNLP client by
making a TCP connection to the master.
Connection parameters are obtained by
parsing the JNLP file.
-loggingConfig FILE : Path to the property file with
java.util.logging settings
-noKeepAlive : Disable TCP socket keep alive on connection
to the master. (default: false)
-noReconnect : Doesn't try to reconnect when a
communication fail, and exit instead
(default: false)
-proxyCredentials USER:PASSWORD : HTTP BASIC AUTH header to pass in for making
HTTP authenticated proxy requests.
-secret HEX_SECRET : Agent connection secret to use instead of
-jnlpCredentials.
-tcp FILE : instead of talking to the master via
stdin/stdout, listens to a random local
port, write that port number to the given
file, then wait for the master to connect to
that port.
-text : encode communication with the master with
base64. Useful for running agent over 8-bit
unsafe protocol like telnet
-version : Shows the version of the remoting jar and
then exits (default: false)
-workDir FILE : Declares the working directory of the
remoting instance (stores cache and logs by
default) (default: -jar-cache)
Agent JVM has terminated. Exit code=0
[08/02/21 11:00:39] Launch failed - cleaning up connection
[08/02/21 11:00:39] [SSH] Connection closed.
How Can I resolved this problem I am using SSH-connection for connecting the docker container.
I have solved above problem by changing the username in dockerfile for build-agent, previously it was set to root user,now I have set this to jenkins and now Jenkins server will use "/home/jenkins/" as a working directory.
You need to build image from this dockerfile and then specify this particular image to the cloud-configuration of docker
You need to change the credentials in jenkins as well
FROM ubuntu:18.04
RUN mkdir -p /var/run/sshd
RUN apt -y update
RUN apt install -y openjdk-8-jdk
RUN apt install -y openssh-server
RUN apt install -y git
#RUN apt install -y maven
#generate the host keys with the default key file path, an empty passphrase
RUN ssh-keygen -A
ADD ./sshd_config /etc/ssh/sshd_config
RUN echo ***jenkins:password123***| chpasswd
RUN chown -R **jenkins:jenkins** /home/jenkins/
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]
I am configuring a Kafka Mongodb sink connector on my Windows machine.
My connect-standalone.properties file has
plugin.path=E:/Tools/kafka_2.12-2.4.0/plugins
My MongoSinkConnector.properties file has
name=mongo-sink
topics=first_topic
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
key.ignore=true
# Specific global MongoDB Sink Connector configuration
connection.uri=mongodb://localhost:27017,mongo1:27017,mongo2:27017,mongo3:27017
database=test_kafka
collection=transactions
max.num.retries=3
retries.defer.timeout=5000
type.name=kafka-connect
In the E:/Tools/kafka_2.12-2.4.0/plugins folder I have mongo-kafka-connect-1.0.1.jar file.
Command
bin\windows\connect-standalone config\connect-standalone.properties config\MongoSinkConnector.properties
The error I get is
[2020-03-23 04:04:12,376] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone)
java.lang.NoClassDefFoundError: com/mongodb/ConnectionString
at com.mongodb.kafka.connect.sink.MongoSinkConfig.createConfigDef(MongoSinkConfig.java:140)
at com.mongodb.kafka.connect.sink.MongoSinkConfig.<clinit>(MongoSinkConfig.java:78)
at com.mongodb.kafka.connect.MongoSinkConnector.config(MongoSinkConnector.java:62)
at org.apache.kafka.connect.connector.Connector.validate(Connector.java:129)
at org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:313)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:194)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:115)
Caused by: java.lang.ClassNotFoundException: com.mongodb.ConnectionString
at java.base/java.net.URLClassLoader.findClass(URLClassLoader.java:471)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:588)
at org.apache.kafka.connect.runtime.isolation.PluginClassLoader.loadClass(PluginClassLoader.java:104)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
Which other jar files should I place in the plugins folder and/or config changes do I have to make?
UPDATE 1
I have placed mongodb-driver-core-4.0.1 and bson-4.0.1 jar files also in the plugins folder, but have the same error.
Finally, I could make the mongo-kafka-connector work on Windows.
Here is what worked for me:
Kafka installation folder is E:\Tools\kafka_2.12-2.4.0
E:\Tools\kafka_2.12-2.4.0\plugins has mongo-kafka-1.0.1-all.jar file.
I downloaded this from https://www.confluent.io/hub/mongodb/kafka-connect-mongodb
Click on the blue Download button at the left to get mongodb-kafka-connect-mongodb-1.0.1.zip file.
There is also the file MongoSinkConnector.properties in the etc folder inside the zip file.
Move it to kafka_installation_folder\plugins
My connect-standalone.properties file has the following entries:
bootstrap.servers=localhost:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
plugin.path=E:/Tools/kafka_2.12-2.4.0/plugins/mongo-kafka-1.0.1-all.jar
My MongoSinkConnector.properties file has the following entries
name=mongo-sink
topics=topic1,topic2
connector.class=com.mongodb.kafka.connect.MongoSinkConnector
tasks.max=1
connection.uri=mongodb://localhost:27017,localhost:27017,localhost:27017
database=test_kafka
collection=transactions
max.num.retries=3
retries.defer.timeout=5000
field.renamer.mapping=[]
field.renamer.regex=[]
max.batch.size = 0
rate.limiting.timeout=0
rate.limiting.every.n=0
How To Run
Start mongodb, zookeeper, kafka server in three consoles.
In 4th console, start Kafka connect --
bin\windows\connect-standalone config\connect-standalone.properties config\MongoSinkConnector.properties
In 5th console, send msgs to a topic (I did for topic1)
bin\windows\kafka-console-producer --broker-list localhost:9092 --topic topic1
>{"Hello":1}
>{"Mongo":2}
>{"World":3}
Open a mongo client and check your database/collections. You will see these three messages.
I'm creating a custom Dockerfile with extensions for official keycloak docker image. I want to change web-context and add some custom providers.
Here's my Dockerfile:
FROM jboss/keycloak:7.0.0
COPY startup-config.cli /opt/jboss/tools/cli/startup-config.cli
RUN /opt/jboss/keycloak/bin/jboss-cli.sh --connect --controller=localhost:9990 --file="/opt/jboss/tools/cli/startup-config.cli"
ENV KEYCLOAK_USER=admin
ENV KEYCLOAK_PASSWORD=admin
and startup-config.cli file:
/subsystem=keycloak-server/:write-attribute(name=web-context,value="keycloak/auth")
/subsystem=keycloak-server/:add(name=providers,value="module:module:x.y.z.some-custom-provider")
Bu unfortunately I receive such error:
The controller is not available at localhost:9990: java.net.ConnectException: WFLYPRT0053: Could not connect to remote+http://localhost:9990. The connection failed: WFLYPRT0053: Could not connect to remote+http://localhost:9990. The connection failed: Connection refused
The command '/bin/sh -c /opt/jboss/keycloak/bin/jboss-cli.sh --connect --controller=localhost:9990 --file="/opt/jboss/tools/cli/startup-config.cli"' returned a non-zero code: 1
Is it a matter of invalid localhost? How should I refer to the management API?
Edit: I also tried with ENTRYPOINT instead of RUN, but the same error occurred during container initialization.
You are trying to have Wildfly load your custom config file at build-time here. The trouble is, that the Wildfly server is not running while the Dockerfile is building.
Wildfly actually already has you covered regarding automatically loading custom config, there is built in support for what you want to do. You simply need to put your config file in a "magic location" inside the image.
You need to drop your config file here:
/opt/jboss/startup-scripts/
So that your Dockerfile looks like this:
FROM jboss/keycloak:7.0.0
COPY startup-config.cli /opt/jboss/startup-scripts/startup-config.cli
ENV KEYCLOAK_USER=admin
ENV KEYCLOAK_PASSWORD=admin
Excerpt from the keycloak documentation:
Adding custom script using Dockerfile
A custom script can be added by
creating your own Dockerfile:
FROM keycloak
COPY custom-scripts/ /opt/jboss/startup-scripts/
Now you can simply start the image, and the built features in keycloak (Wildfly feature really) will go look for a config in that spedific directory, and then attempt to load it up.
Edit from comment with final solution:
While the original answer solved the issue with being able to pass configuration to the server at all, an issue remained with the content of the script. The following error was received when starting the container:
=========================================================================
Executing cli script: /opt/jboss/startup-scripts/startup-config.cli
No connection to the controller.
=========================================================================
The issue turned out to be in the startup-config.cli script, where the jboss command embed-server was missing, needed to initiate a connection to the jboss instance. Also missing was the closing stop-embedded-server command. More about configuring jboss in this manner in the docs here: CHAPTER 8. EMBEDDING A SERVER FOR OFFLINE CONFIGURATION
The final script:
embed-server --std-out=echo
/subsystem=keycloak-server/theme=defaults/:write-attribute(name=cacheThemes,value=false)
/subsystem=keycloak-server/theme=defaults/:write-attribute(name=cacheTemplates,value=false)
stop-embedded-server
WildFly management interfaces are not available when building the Docker image. Your only option is to start the CLI in embedded mode as discussed here Running CLI commands in WildFly Dockerfile.
A more advanced approach consists in using the S2I installation scripts to trigger CLI commands.
I have written a spark job on my local machine which reads the file from google cloud storage using google hadoop connector like gs://storage.googleapis.com/ as mentioned in https://cloud.google.com/dataproc/docs/connectors/cloud-storage
I have set up service account with compute engine and storage permissions.
My spark configuration and code is
SparkConf conf = new SparkConf();
conf.setAppName("SparkAPp").setMaster("local");
conf.set("google.cloud.auth.service.account.enable", "true");
conf.set("google.cloud.auth.service.account.email", "xxx-compute#developer.gserviceaccount.com");
conf.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");
conf.set("fs.gs.project.id", "xxx-990711");
conf.set("fs.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem");
conf.set("fs.AbstractFileSystem.gs.impl", "com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem");
SparkContext sparkContext = new SparkContext(conf);
JavaRDD<String> data = sparkContext.textFile("gs://storage.googleapis.com/xxx/xxx.txt", 0).toJavaRDD();
data.foreach(line -> System.out.println(line));
I have set up environment variable also named GOOGLE_APPLICATION_CREDENTIALS which points to the key file. I have tried using both key files i.e. json & P12. But unable to access the file.
The error which I get is
java.net.UnknownHostException: metadata
java.io.IOException: Error getting access token from metadata server at: http://metadata/computeMetadata/v1/instance/service-accounts/default/token
at com.google.cloud.hadoop.util.CredentialFactory.getCredentialFromMetadataServiceAccount(CredentialFactory.java:208)
at com.google.cloud.hadoop.util.CredentialConfiguration.getCredential(CredentialConfiguration.java:70)
I am running my job from eclipse with java 8, spark 2.2.0 dependencies and gcs-connector 1.6.1.hadoop2 .
I need to connect only using service account and not by OAuth mechanism.
Thanks in advance
Are you trying it locally? If yes then you need to set the environment variable GOOGLE_APPLICATION_CREDENTIALS to your key.json or set it to HadoopConfiguration instead of setting it to SparkConf like:
Configuration hadoopConfiguration = sparkContext.hadoopConfiguration();
hadoopConfiguration.set("google.cloud.auth.service.account.enable", true);
hadoopConfiguration.set("google.cloud.auth.service.account.email", "xxx-compute#developer.gserviceaccount.com");
hadoopConfiguration.set("google.cloud.auth.service.account.keyfile", "/root/Documents/xxx-compute-e71ddbafd13e.p12");
So first, I want to say the only thing I have seen address this issue is here: Spark 1.6.1 SASL. However, when adding the configuration for the spark and yarn authentication, it is still not working. Below is my configuration for spark using spark-submit on a yarn cluster on amazon's emr:
SparkConf sparkConf = new SparkConf().setAppName("secure-test");
sparkConf.set("spark.authenticate.enableSaslEncryption", "true");
sparkConf.set("spark.network.sasl.serverAlwaysEncrypt", "true");
sparkConf.set("spark.authenticate", "true");
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer");
sparkConf.set("spark.kryo.registrator", "org.nd4j.Nd4jRegistrator");
try {
sparkConf.registerKryoClasses(new Class<?>[]{
Class.forName("org.apache.hadoop.io.LongWritable"),
Class.forName("org.apache.hadoop.io.Text")
});
} catch (Exception e) {}
sparkContext = new JavaSparkContext(sparkConf);
sparkContext.hadoopConfiguration().set("fs.s3a.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem");
sparkContext.hadoopConfiguration().set("fs.s3a.enableServerSideEncryption", "true");
sparkContext.hadoopConfiguration().set("spark.authenticate", "true");
Note, I added the spark.authenticate to the sparkContext's hadoop configuration in code instead of the core-site.xml (which I am assuming I can do that since other things work as well).
Looking here: https://github.com/apache/spark/blob/master/common/network-yarn/src/main/java/org/apache/spark/network/yarn/YarnShuffleService.java it seems like both spark.authenticate's are necessary. When I run this application, I get the following stack trace.
17/01/03 22:10:23 INFO storage.BlockManager: Registering executor with local external shuffle service.
17/01/03 22:10:23 ERROR client.TransportClientFactory: Exception while bootstrapping client after 178 ms
java.lang.RuntimeException: java.lang.IllegalArgumentException: Unknown message type: -22
at org.apache.spark.network.shuffle.protocol.BlockTransferMessage$Decoder.fromByteBuffer(BlockTransferMessage.java:67)
at org.apache.spark.network.shuffle.ExternalShuffleBlockHandler.receive(ExternalShuffleBlockHandler.java:71)
at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
at java.lang.Thread.run(Thread.java:745)
In Spark's docs, it says
For Spark on YARN deployments, configuring spark.authenticate to true will automatically handle generating and distributing the shared secret. Each application will use a unique shared secret.
which seems wrong based on the comments in the yarn file above, but with trouble shooting, I am still lost on where I should go to get sasl to work? Am I missing something obvious that is documented somewhere?
So I finally figured it out. The previous StackOverflow thread was technically correct. I needed to add the spark.authenticate to the yarn configuration. Maybe it is possible to do this, but I can't figure out how to add this configuration in the code, which makes sense at a high level why this is the case. I will post my configuration below in case anyone else runs into this issue in the future.
First, I used an aws emr configurations file (An example of this is when using aws cli aws emr create-cluster --configurations file://youpathhere.json)
Then, I added the following json to the file:
[{
"Classification": "spark-defaults",
"Properties": {
"spark.authenticate": "true",
"spark.authenticate.enableSaslEncryption": "true",
"spark.network.sasl.serverAlwaysEncrypt": "true"
}
},
{
"Classification": "core-site",
"Properties": {
"spark.authenticate": "true"
}
}]
I got the same error message on Spark on Dataproc (Google Cloud Platform) after I added configuration options for Spark network encryption.
I initially created the Dataproc cluster with the following command.
gcloud dataproc clusters create test-encryption --no-address \
--service-account=<SERVICE-ACCOUNT> \
--zone=europe-west3-c --region=europe-west3 \
--subnet=<SUBNET> \
--properties 'spark:spark.authenticate=true,spark:spark.network.crypto.enabled=true'
The solution was to add in addition the configuration 'yarn:spark.authenticate=true'. A working Dataproc cluster with RPC encryption of Spark can therefore be created as follows.
gcloud dataproc clusters create test-encryption --no-address \
--service-account=<SERVICE-ACCOUNT> \
--zone=europe-west3-c --region=europe-west3 \
--subnet=<SUBNET> \
--properties 'spark:spark.authenticate=true,spark:spark.network.crypto.enabled=true,yarn:spark.authenticate=true'
I verified the encryption with ngrep. I installed ngrep as follows on the master node.
sudo apt-get update
sudo apt-get install ngrep
I then run ngrep on an arbitrary port 20001.
sudo ngrep port 20001
If you then run a Spark job with the following configuration properties you can see the encrypted communication between driver and worker nodes.
spark.driver.port=20001
spark.blockManager.port=20002
Note, I would always advice also to enable Kerberos on Dataproc to secure authentication for Hadoop, Yarn etc. This can be achieved with the flag --enable-kerberos in the cluster creation command.