Unable to submit a spark job on spark cluster on docker - java

Like anticipated by the title, I have some problems to submit a spark job to a spark cluster running on docker.
I wrote a very simple spark job in scala, subscribe to a kafka server arrange some data and store these in an elastichsearch database.
kafka and elasticsearch are already running in docker.
Everything works perfectly if I run the spark job from my Ide in my dev environment (Windows / IntelliJ).
Then (and I'm not a java guy at all), I added a spark cluster following these instructions: https://github.com/big-data-europe/docker-spark
The cluster looks healthy when consulting its dashboard. I created a cluster consisting of a master and a worker.
Now, this is my job written in scala:
import java.io.Serializable
import org.apache.commons.codec.StringDecoder
import org.apache.hadoop.fs.LocalFileSystem
import org.apache.hadoop.hdfs.DistributedFileSystem
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark
import org.apache.spark.SparkConf
import org.elasticsearch.spark._
import org.apache.spark.sql.SQLContext
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.util.parsing.json.JSON
object KafkaConsumer {
def main(args: Array[String]): Unit = {
val sc = new SparkConf()
.setMaster("local[*]")
.setAppName("Elastic Search Indexer App")
sc.set("es.index.auto.create", "true")
val elasticResource = "iot/demo"
val ssc = new StreamingContext(sc, Seconds(10))
//ssc.checkpoint("./checkpoint")
val kafkaParams = Map(
"bootstrap.servers" -> "kafka:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"auto.offset.reset" -> "earliest",
"group.id" -> "group0"
)
val topics = List("test")
val stream = KafkaUtils.createDirectStream(
ssc,
PreferConsistent,
ConsumerStrategies.Subscribe[String, String](topics.distinct, kafkaParams)
)
case class message(key: String, timestamp: Long, payload: Object)
val rdds = stream.map(record => message(record.key, record.timestamp, record.value))
val es_config: scala.collection.mutable.Map[String, String] =
scala.collection.mutable.Map(
"pushdown" -> "true",
"es.nodes" -> "http://docker-host",
"es.nodes.wan.only" -> "true",
"es.resource" -> elasticResource,
"es.ingest.pipeline" -> "iot-test-pipeline"
)
rdds.foreachRDD { rdd =>
rdd.saveToEs(es_config)
rdd.collect().foreach(println)
}
ssc.start()
ssc.awaitTermination()
}
}
To submit this to the cluster I did:
With "sbt-assembly" plugin, I created a fat jar file with all dependencies.
Define an assembly strategy in build.sbt to avoid deduplicate errors on merging ...
Then submit with:
./spark-submit.cmd --class KafkaConsumer --master
spark://docker-host:7077
/c/Users/shams/Documents/Appunti/iot-demo-app/spark-streaming/target/scala-2.11/
spark-streaming-assembly-1.0.jar
BUT I have this error:
19/02/27 11:18:12 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable Exception in thread "main" java.io.IOException: No
FileSystem for scheme: C
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1897)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:694)
at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:135)
at org.apache.spark.deploy.SparkSubmit$$anonfun$doPrepareSubmitEnvironment$7.apply(SparkSubmit.scala:416)
at org.apache.spark.deploy.SparkSubmit$$anonfun$doPrepareSubmitEnvironment$7.apply(SparkSubmit.scala:416)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit$.doPrepareSubmitEnvironment(SparkSubmit.scala:415)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:250)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:171)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
After a day of trying I have not solved and I can not understand where in my work wants to access a certain volume as seems to be said by the error
Can be related with the warning message?
Then, how I should edit my script to avoid that problem?
Thanks in advance.
UPDATE:
Problem seems not related to my code because I tried tu submit a simple hello world app compiled in the same way but I have the same issue.

After many attempts and research I have come to the conclusion that the problem could be that I'm using the windows version of spark-submit from my pc to submit the job.
I could not fully understand but for now, moving the file directly to the master and worker node I was able to submit it from there.
First copy on the container:
docker cp spark-streaming-assembly-1.0.jar 21b43cb2e698:/spark/bin
Then I execute (in /spark/bin folder):
./spark-submit --class KafkaConsumer --deploy-mode cluster --master spark://spark-master:7077 spark-streaming-assembly-1.0.jar
This is the workaround that i found at the moment.

You can mount the directory of your jobs to your container by running your submit container like this
docker run -it --rm\
--name spark-submit \
--mount type=bind,source="$(pwd)"/jobs,target=/home/jobs,readonly \
--network spark-net \
-p 4040:4040 \
-p 18080:18080 \
your-spark-image \
bash
This command will mount your jobs folder directly to your container and you can change in host and those changes will automatically be present in your container

Related

Get 'Cannot read property 'replace' of undefined' error on jHipster

While running jHipster command, I got the following errors:
+ jhipster axon --skip-git --blueprint cst
INFO! Using JHipster version installed globally
INFO! No custom sharedOptions found within blueprint: generator-jhipster-cst at /usr/local/lib/node_modules/generator-jhipster-cst
events.js:288
throw er; // Unhandled 'error' event
^
TypeError: Cannot read property 'replace' of undefined
at new module.exports (/Users/.../jhipster/generator-jhipster-cst/generators/subgenerator-base.js:27:49)
at new module.exports (/Users/.../jhipster/generator-jhipster-cst/generators/aws/index.js:3:18)
at Environment.instantiate (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:673:23)
at Environment.create (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:645:19)
at /usr/local/lib/node_modules/generator-jhipster/cli/cli.js:74:31
at Array.forEach (<anonymous>)
at Object.<anonymous> (/usr/local/lib/node_modules/generator-jhipster/cli/cli.js:62:29)
at Module._compile (internal/modules/cjs/loader.js:1158:30)
at Object.Module._extensions..js (internal/modules/cjs/loader.js:1178:10)
at Module.load (internal/modules/cjs/loader.js:1002:32)
Emitted 'error' event on Environment instance at:
at Environment.error (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:293:12)
at Environment.create (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:647:19)
at /usr/local/lib/node_modules/generator-jhipster/cli/cli.js:74:31
[... lines matching original stack trace ...]
at Module.load (internal/modules/cjs/loader.js:1002:32)
at Function.Module._load (internal/modules/cjs/loader.js:901:14)
at Module.require (internal/modules/cjs/loader.js:1044:19)
I tried to update npm and jHipster, but there was another problem with upgrading jHipster:
~ sudo jhipster upgrade
INFO! Using JHipster version installed globally
INFO! Executing jhipster:upgrade
This seems to be an app blueprinted project with jhipster 6.6.0 bug (https://github.com/jhipster/generator-jhipster/issues/11045), you should pass --blueprints to jhipster upgrade commmand.
Error: This seems to be an app blueprinted project with jhipster 6.6.0 bug (https://github.com/jhipster/generator-jhipster/issues/11045), you should pass --blueprints to jhipster upgrade commmand.
at module.exports.error (/usr/local/lib/node_modules/generator-jhipster/generators/generator-base.js:1590:15)
at new module.exports (/usr/local/lib/node_modules/generator-jhipster/generators/upgrade/index.js:95:18)
at Environment.instantiate (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:673:23)
at Environment.create (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:645:19)
at instantiateAndRun (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:729:30)
at Environment.run (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:758:12)
at runYoCommand (/usr/local/lib/node_modules/generator-jhipster/cli/cli.js:53:13)
at Command.<anonymous> (/usr/local/lib/node_modules/generator-jhipster/cli/cli.js:178:17)
at Command.listener [as _actionHandler] (/usr/local/lib/node_modules/generator-jhipster/node_modules/commander/index.js:413:31)
at Command._parseCommand (/usr/local/lib/node_modules/generator-jhipster/node_modules/commander/index.js:914:14)
NPM: 6.14.8
Node: 12.16.1
jhipster: 6.10.3
Java: Tested with 13.0.2 & 11.0.8
Updated
The part of the code from which the error originated ( 'replace' of undefined ):
const configuration = {
...opts,
...this.getAllJhipsterConfig(this, true)
};
this.baseName = configuration.baseName;
this.serverPort = configuration.serverPort;
this.packageName = configuration.packageName;
this.rootPackageName = this.packageName.replace(/\.[^.]+$/, '');
Could you explain to me, how can I fix the above problem, please?

Janusgraph libs cant communicate with hbase in kerberos environment(Failed to specify server's Kerberos principal name)

I am getting "Failed to specify server's Kerberos principal name" when attempting to connect to habse with janusgraph in a kerberos hadoop cluster
First off a little environmental info -
OS: 7.6.1810
Java: 1.8.0_191-b12
Spark: 2.3.2.3.1.0.78-4
YARN: 2.5.0
Hbase: 2.0.2.3.1.0.78-4
Hadoop: 3.1.1.3.1.0.78-4
Kerberos: 5 version 1.15.1
Janusgraph: 0.4.0
I did kinit and test the bundled gremlin client to ensure the graph.properties for the env works. It was able to connect up create a simple test graph, add some vertices, restart and retrieve the stored data. So cool the bundled copy works.
For laziness/simplicity I decided to load the spark-shell with janusgraph libs. While attempting to connect to the same graph it started throwing kerberos errors.
First thought being maybe its a hadoop/spark lib/conf conflict(pretty typical). So built out a very simple and barebones java app in an attempt to see if it would work. Got the same errors as spark.
Spark Invocations -
First attempt:
spark-shell \
--conf spark.driver.userClassPathFirst=true \
--conf spark.executor.userClassPathFirst=true \
--conf spark.driver.userClassPathFirst=true \
--jars /etc/hadoop/conf/core-site.xml,/etc/hbase/conf/hbase-site.xml,groovy-console-2.5.6.jar,javax.servlet-api-3.1.0.jar,netty-buffer-4.1.25.Final.jar,RoaringBitmap-0.5.11.jar,groovy-groovysh-2.5.6-indy.jar,javax.ws.rs-api-2.0.1.jar,netty-codec-4.1.25.Final.jar,activation-1.1.jar,groovy-json-2.5.6-indy.jar,jaxb-api-2.2.2.jar,netty-common-4.1.25.Final.jar,airline-0.6.jar,groovy-jsr223-2.5.6-indy.jar,jaxb-impl-2.2.3-1.jar,netty-handler-4.1.25.Final.jar,antlr-2.7.7.jar,groovy-swing-2.5.6.jar,jbcrypt-0.4.jar,netty-resolver-4.1.25.Final.jar,antlr-3.2.jar,groovy-templates-2.5.6.jar,jboss-logging-3.1.2.GA.jar,netty-transport-4.1.25.Final.jar,antlr-runtime-3.2.jar,groovy-xml-2.5.6.jar,jcabi-log-0.14.jar,noggit-0.6.jar,aopalliance-repackaged-2.4.0-b34.jar,gson-2.2.4.jar,jcabi-manifests-1.1.jar,objenesis-2.1.jar,apacheds-i18n-2.0.0-M15.jar,guava-18.0.jar,jcl-over-slf4j-1.7.25.jar,ohc-core-0.3.4.jar,apacheds-kerberos-codec-2.0.0-M15.jar,hadoop-annotations-2.7.7.jar,je-7.5.11.jar,org.apache.servicemix.bundles.commons-csv-1.0-r706900_3.jar,api-asn1-api-1.0.0-M20.jar,hadoop-auth-2.7.7.jar,jersey-client-1.9.jar,oro-2.0.8.jar,api-util-1.0.0-M20.jar,hadoop-client-2.7.7.jar,jersey-client-2.22.2.jar,osgi-resource-locator-1.0.1.jar,asm-3.1.jar,hadoop-common-2.7.7.jar,jersey-common-2.22.2.jar,paranamer-2.6.jar,asm-5.0.3.jar,hadoop-distcp-2.7.7.jar,jersey-container-servlet-2.22.2.jar,picocli-3.9.2.jar,asm-analysis-5.0.3.jar,hadoop-gremlin-3.4.1.jar,jersey-container-servlet-core-2.22.2.jar,protobuf-java-2.5.0.jar,asm-commons-5.0.3.jar,hadoop-hdfs-2.7.7.jar,jersey-core-1.9.jar,py4j-0.10.7.jar,asm-tree-5.0.3.jar,hadoop-mapreduce-client-app-2.7.7.jar,jersey-guava-2.22.2.jar,pyrolite-4.13.jar,asm-util-5.0.3.jar,hadoop-mapreduce-client-common-2.7.7.jar,jersey-json-1.9.jar,reflections-0.9.9-RC1.jar,astyanax-cassandra-3.10.2.jar,hadoop-mapreduce-client-core-2.7.7.jar,jersey-media-jaxb-2.22.2.jar,reporter-config-base-3.0.0.jar,astyanax-cassandra-all-shaded-3.10.2.jar,hadoop-mapreduce-client-jobclient-2.7.7.jar,jersey-server-1.9.jar,reporter-config3-3.0.0.jar,astyanax-core-3.10.2.jar,hadoop-mapreduce-client-shuffle-2.7.7.jar,jersey-server-2.22.2.jar,scala-library-2.11.8.jar,astyanax-recipes-3.10.2.jar,hadoop-yarn-api-2.7.7.jar,jets3t-0.7.1.jar,scala-reflect-2.11.8.jar,astyanax-thrift-3.10.2.jar,hadoop-yarn-client-2.7.7.jar,jettison-1.3.3.jar,scala-xml_2.11-1.0.5.jar,audience-annotations-0.5.0.jar,hadoop-yarn-common-2.7.7.jar,jetty-6.1.26.jar,servlet-api-2.5.jar,avro-1.7.4.jar,hadoop-yarn-server-common-2.7.7.jar,jetty-sslengine-6.1.26.jar,sesame-model-2.7.10.jar,avro-ipc-1.8.2.jar,hamcrest-core-1.3.jar,jetty-util-6.1.26.jar,sesame-rio-api-2.7.10.jar,avro-mapred-1.8.2-hadoop2.jar,hbase-shaded-client-2.1.5.jar,jffi-1.2.16-native.jar,sesame-rio-datatypes-2.7.10.jar,bigtable-hbase-1.x-shaded-1.11.0.jar,hbase-shaded-mapreduce-2.1.5.jar,jffi-1.2.16.jar,sesame-rio-languages-2.7.10.jar,caffeine-2.3.1.jar,hibernate-validator-4.3.0.Final.jar,jline-2.14.6.jar,sesame-rio-n3-2.7.10.jar,cassandra-all-2.2.13.jar,high-scale-lib-1.0.6.jar,jna-4.0.0.jar,sesame-rio-ntriples-2.7.10.jar,cassandra-driver-core-3.7.1.jar,high-scale-lib-1.1.4.jar,jnr-constants-0.9.9.jar,sesame-rio-rdfxml-2.7.10.jar,cassandra-thrift-2.2.13.jar,hk2-api-2.4.0-b34.jar,jnr-ffi-2.1.7.jar,sesame-rio-trig-2.7.10.jar,checker-compat-qual-2.5.2.jar,hk2-locator-2.4.0-b34.jar,jnr-posix-3.0.44.jar,sesame-rio-trix-2.7.10.jar,chill-java-0.9.3.jar,hk2-utils-2.4.0-b34.jar,jnr-x86asm-1.0.2.jar,sesame-rio-turtle-2.7.10.jar,chill_2.11-0.9.3.jar,hppc-0.7.1.jar,joda-time-2.8.2.jar,sesame-util-2.7.10.jar,commons-cli-1.3.1.jar,htrace-core-3.1.0-incubating.jar,jsch-0.1.54.jar,sigar-1.6.4.jar,commons-codec-1.7.jar,htrace-core4-4.2.0-incubating.jar,json-20090211_1.jar,slf4j-api-1.7.12.jar,commons-collections-3.2.2.jar,httpasyncclient-4.1.2.jar,json-simple-1.1.jar,slf4j-log4j12-1.7.12.jar,commons-configuration-1.10.jar,httpclient-4.4.1.jar,json4s-ast_2.11-3.5.3.jar,snakeyaml-1.11.jar,commons-crypto-1.0.0.jar,httpcore-4.4.1.jar,json4s-core_2.11-3.5.3.jar,snappy-java-1.0.5-M3.jar,commons-httpclient-3.1.jar,httpcore-nio-4.4.5.jar,json4s-jackson_2.11-3.5.3.jar,solr-solrj-7.0.0.jar,commons-io-2.3.jar,httpmime-4.4.1.jar,json4s-scalap_2.11-3.5.3.jar,spark-core_2.11-2.4.0.jar,commons-lang-2.5.jar,ivy-2.3.0.jar,jsp-api-2.1.jar,spark-gremlin-3.4.1.jar,commons-lang3-3.3.1.jar,jackson-annotations-2.6.6.jar,jsr305-3.0.0.jar,spark-kvstore_2.11-2.4.0.jar,commons-logging-1.1.1.jar,jackson-core-2.6.6.jar,jts-core-1.15.0.jar,spark-launcher_2.11-2.4.0.jar,commons-math3-3.2.jar,jackson-core-asl-1.9.13.jar,jul-to-slf4j-1.7.16.jar,spark-network-common_2.11-2.4.0.jar,commons-net-1.4.1.jar,jackson-databind-2.6.6.jar,junit-4.12.jar,spark-network-shuffle_2.11-2.4.0.jar,commons-pool-1.6.jar,jackson-datatype-json-org-2.6.6.jar,kryo-shaded-4.0.2.jar,spark-tags_2.11-2.4.0.jar,commons-text-1.0.jar,jackson-jaxrs-1.9.13.jar,leveldbjni-all-1.8.jar,spark-unsafe_2.11-2.4.0.jar,compress-lzf-1.0.0.jar,jackson-mapper-asl-1.9.13.jar,libthrift-0.9.2.jar,spatial4j-0.7.jar,concurrentlinkedhashmap-lru-1.3.jar,jackson-module-paranamer-2.6.6.jar,log4j-1.2.16.jar,stax-api-1.0-2.jar,crc32ex-0.1.1.jar,jackson-module-scala_2.11-2.6.6.jar,logback-classic-1.1.3.jar,stax-api-1.0.1.jar,curator-client-2.7.1.jar,jackson-xc-1.9.13.jar,logback-core-1.1.3.jar,stax2-api-3.1.4.jar,curator-framework-2.7.1.jar,jamm-0.3.0.jar,lucene-analyzers-common-7.0.0.jar,stream-2.7.0.jar,curator-recipes-2.7.1.jar,janusgraph-all-0.4.0.jar,lucene-core-7.0.0.jar,stringtemplate-3.2.jar,disruptor-3.0.1.jar,janusgraph-berkeleyje-0.4.0.jar,lucene-queries-7.0.0.jar,super-csv-2.1.0.jar,dom4j-1.6.1.jar,janusgraph-bigtable-0.4.0.jar,lucene-queryparser-7.0.0.jar,thrift-server-0.3.7.jar,ecj-4.4.2.jar,janusgraph-cassandra-0.4.0.jar,lucene-sandbox-7.0.0.jar,tinkergraph-gremlin-3.4.1.jar,elasticsearch-rest-client-6.6.0.jar,janusgraph-core-0.4.0.jar,lucene-spatial-7.0.0.jar,unused-1.0.0.jar,exp4j-0.4.8.jar,janusgraph-cql-0.4.0.jar,lucene-spatial-extras-7.0.0.jar,uuid-3.2.jar,findbugs-annotations-1.3.9-1.jar,janusgraph-es-0.4.0.jar,lucene-spatial3d-7.0.0.jar,validation-api-1.1.0.Final.jar,gbench-0.4.3-groovy-2.4.jar,janusgraph-hadoop-0.4.0.jar,lz4-1.3.0.jar,vavr-0.9.0.jar,gmetric4j-1.0.7.jar,janusgraph-hbase-0.4.0.jar,lz4-java-1.4.0.jar,vavr-match-0.9.0.jar,gprof-0.3.1-groovy-2.4.jar,janusgraph-lucene-0.4.0.jar,metrics-core-3.0.2.jar,woodstox-core-asl-4.4.1.jar,gremlin-console-3.4.1.jar,janusgraph-server-0.4.0.jar,metrics-core-3.2.2.jar,xbean-asm6-shaded-4.8.jar,gremlin-core-3.4.1.jar,janusgraph-solr-0.4.0.jar,metrics-ganglia-3.2.2.jar,xercesImpl-2.9.1.jar,gremlin-driver-3.4.1.jar,javapoet-1.8.0.jar,metrics-graphite-3.2.2.jar,xml-apis-1.3.04.jar,gremlin-groovy-3.4.1.jar,javassist-3.18.0-GA.jar,metrics-json-3.1.5.jar,xmlenc-0.52.jar,gremlin-server-3.4.1.jar,javatuples-1.2.jar,metrics-jvm-3.2.2.jar,zookeeper-3.4.6.jar,gremlin-shaded-3.4.1.jar,javax.inject-1.jar,minlog-1.3.0.jar,zstd-jni-1.3.2-2.jar,groovy-2.5.6-indy.jar,javax.inject-2.4.0-b34.jar,netty-3.10.5.Final.jar,groovy-cli-picocli-2.5.6.jar,javax.json-1.0.jar,netty-all-4.1.25.Final.jar
Second attempt(less libs):
spark-shell \
--conf spark.driver.userClassPathFirst=true \
--conf spark.executor.userClassPathFirst=true \
--conf spark.driver.userClassPathFirst=true \
--jars /etc/hadoop/conf/core-site.xml,/etc/hbase/conf/hbase-site.xml,gremlin-core-3.4.1.jar,gremlin-driver-3.4.3.jar,gremlin-shaded-3.4.1.jar,groovy-2.5.7.jar,groovy-json-2.5.7.jar,javatuples-1.2.jar,commons-lang3-3.8.1.jar,commons-configuration-1.10.jar,janusgraph-core-0.4.0.jar,hbase-shaded-client-2.1.5.jar,janusgraph-hbase-0.4.0.jar,high-scale-lib-1.1.4.jar
Java attempt:
java \
-cp /etc/hadoop/conf/core-site.xml:/etc/hbase/conf/hbase-site.xml:hbase-shaded-client-2.1.5.jar:janusgraph-hbase-0.4.0.jar:janusgraph-core-0.4.0.jar:commons-lang3-3.8.1.jar:gremlin-driver-3.4.3.jar:groovy-2.5.7.jar:javatuples-1.2.jar:commons-configuration-1.10.jar:gremlin-core-3.4.1.jar:gremlin-shaded-3.4.1.jar:groovy-json-2.5.7.jar:high-scale-lib-1.1.4.jar:Janusgraph_Ingestion.jar:../janusgraph-0.4.0-hadoop2/lib/commons-lang-2.5.jar:../janusgraph-0.4.0-hadoop2/lib/slf4j-api-1.7.12.jar:../janusgraph-0.4.0-hadoop2/lib/slf4j-log4j12-1.7.12.jar:../janusgraph-0.4.0-hadoop2/lib/log4j-1.2.16.jar:../janusgraph-0.4.0-hadoop2/lib/guava-18.0.jar:../janusgraph-0.4.0-hadoop2/lib/commons-logging-1.1.1.jar:../janusgraph-0.4.0-hadoop2/lib/commons-io-2.3.jar:../janusgraph-0.4.0-hadoop2/lib/htrace-core4-4.2.0-incubating.jar \
Entry
As far as code being executed in the spark-shell or java
import org.janusgraph.core.JanusGraphFactory;
val g = JanusGraphFactory.open("/home/devuser/janusgraph-0.4.0-hadoop2/conf/janusgraph-hbase.properties").traversal()
Also tried adding the below before attempting to open the graph
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.security.UserGroupInformation;
val conf = new Configuration();
conf.set("hadoop.security.authentication", "Kerberos");
UserGroupInformation.setConfiguration(conf);
UserGroupInformation.loginUserFromSubject(null);
Including graph connect config for completeness
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.backend=hbase
storage.hostname=hosta.example.com:2181,hostb.example.com:2181,hostc.example.com:2181
storage.hbase.table=JgraphTest
storage.hbase.ext.zookeeper.znode.parent=/hbase-secure
storage.batch-loading=false
java.security.krb5.conf=/etc/krb5.conf
storage.hbase.ext.hbase.security.authentication=kerberos
storage.hbase.ext.hbase.security.authorization=true
storage.hbase.ext.hadoop.security.authentication=kerberos
storage.hbase.ext.hadoop.security.authorization=true
storage.hbase.ext.hbase.regionserver.kerberos.principal=hbase/_HOST#HDPDEV.example.com
ids.block-size=10000
ids.renew-timeout=3600000
storage.buffer-size=10000
ids.num-partitions=10
ids.partition=true
schema.default=none
cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5
Expected result would be a usable traversal object
Actual result below
19/10/18 11:40:30 TRACE NettyRpcConnection: Connecting to hostb.example.com/192.168.1.101:16000
19/10/18 11:40:30 DEBUG AbstractHBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos principal name is null
19/10/18 11:40:30 TRACE AbstractRpcClient: Call: IsMasterRunning, callTime: 4ms
19/10/18 11:40:30 DEBUG RpcRetryingCallerImpl: Call exception, tries=7, retries=16, started=8197 ms ago, cancelled=false, msg=java.io.IOException: Call to hostb.example.com/192.168.1.101:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name, details=, see https://s.apache.org/timeout, exception=org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Call to hostb.example.com/192.168.1.101:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1175)
at org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1234)
at org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1223)
at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57)
at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105)
at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3089)
at org.apache.hadoop.hbase.client.HBaseAdmin.getHTableDescriptor(HBaseAdmin.java:569)
at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:529)
at org.janusgraph.diskstorage.hbase.HBaseAdmin1_0.getTableDescriptor(HBaseAdmin1_0.java:105)
at org.janusgraph.diskstorage.hbase.HBaseStoreManager.ensureTableExists(HBaseStoreManager.java:726)
at org.janusgraph.diskstorage.hbase.HBaseStoreManager.getLocalKeyPartition(HBaseStoreManager.java:537)
at org.janusgraph.diskstorage.hbase.HBaseStoreManager.getDeployment(HBaseStoreManager.java:376)
at org.janusgraph.diskstorage.hbase.HBaseStoreManager.getFeatures(HBaseStoreManager.java:418)
at org.janusgraph.graphdb.configuration.builder.GraphDatabaseConfigurationBuilder.build(GraphDatabaseConfigurationBuilder.java:51)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:161)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:132)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:79)
at $line22.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:26)
at $line22.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:31)
at $line22.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:33)
at $line22.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:35)
at $line22.$read$$iw$$iw$$iw$$iw.<init>(<console>:37)
at $line22.$read$$iw$$iw$$iw.<init>(<console>:39)
at $line22.$read$$iw$$iw.<init>(<console>:41)
at $line22.$read$$iw.<init>(<console>:43)
at $line22.$read.<init>(<console>:45)
at $line22.$read$.<init>(<console>:49)
at $line22.$read$.<clinit>(<console>)
at $line22.$eval$.$print$lzycompute(<console>:7)
at $line22.$eval$.$print(<console>:6)
at $line22.$eval.$print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786)
at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638)
at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637)
at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31)
at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19)
at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569)
at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565)
at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807)
at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681)
at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395)
at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:923)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909)
at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97)
at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909)
at org.apache.spark.repl.Main$.doMain(Main.scala:76)
at org.apache.spark.repl.Main$.main(Main.scala:56)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: Call to hostb.example.com/192.168.1.101:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:221)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410)
at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406)
at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103)
at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118)
at org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307)
at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1377)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329)
at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315)
at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:929)
at org.apache.hadoop.hbase.ipc.NettyRpcConnection.failInit(NettyRpcConnection.java:179)
at org.apache.hadoop.hbase.ipc.NettyRpcConnection.saslNegotiate(NettyRpcConnection.java:197)
at org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$800(NettyRpcConnection.java:71)
at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:273)
at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:261)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:306)
at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:341)
at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633)
at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed to specify server's Kerberos principal name
at org.apache.hadoop.hbase.security.AbstractHBaseSaslRpcClient.<init>(AbstractHBaseSaslRpcClient.java:99)
at org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClient.<init>(NettyHBaseSaslRpcClient.java:43)
at org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.<init>(NettyHBaseSaslRpcClientHandler.java:70)
at org.apache.hadoop.hbase.ipc.NettyRpcConnection.saslNegotiate(NettyRpcConnection.java:194)
... 18 more
Well I feel like an idiot. Sooo apparently the answer was actually a really simple deal. Appears that the gremlin client will work fine when just using storage.hbase.ext.hbase.regionserver.kerberos.principal but when using the libs out side of that storage.hbase.ext.hbase.master.kerberos.principal is needed as well. Well as far as this things are working on to the next set of problems I made for myself lol.

Docker-Compose version is unsupported

I'm using TestContainers to run dgraph.
Here is my test code:
package net.dgraph.java.client
import io.dgraph.DgraphAsyncClient
import io.dgraph.DgraphClient
import org.testcontainers.containers.DockerComposeContainer
import org.testcontainers.containers.GenericContainer
import org.testcontainers.spock.Testcontainers
import spock.lang.Shared
import spock.lang.Specification
import java.time.Duration
import java.time.temporal.ChronoUnit
#Testcontainers
public class DGraphTest extends Specification {
private SyncSigmaDgraphClient syncClient
private AsyncSigmaDGraphClient asyncClient
private static address
static DockerComposeContainer compose
def setup() {
syncClient = SigmaDgraphClientBuilder
.create()
.withHost(address)
.withPort(port1)
.buildSync()
}
static {
compose =
new DockerComposeContainer(
new File("src/test/resources/docker-compose.yaml"))
compose.start()
this.address = compose.getServiceHost("dgraph", 8080)
this.port1 = compose.getServicePort("dgraph",8080)
}
And my docker-compose.yaml file looks like:
version: "3.2"
services:
zero:
image: dgraph/dgraph:latest
volumes:
- /tmp/data:/dgraph
ports:
- 5080:5080
- 6080:6080
restart: on-failure
command: dgraph zero --my=zero:5080
alpha:
image: dgraph/dgraph:latest
volumes:
- /tmp/data:/dgraph
ports:
- 8080:8080
- 9080:9080
restart: on-failure
command: dgraph alpha --my=alpha:7080 --lru_mb=2048 --zero=zero:5080
ratel:
image: dgraph/dgraph:latest
ports:
- 8000:8000
command: dgraph-ratel
My docker version is Docker version 19.03.2, build 6a30dfc and my docker-compose version is docker-compose version 1.24.1, build 4667896b
.
However I get the following error:
[main] ERROR 🐳 [docker/compose:1.8.0] - Log output from the failed container:
Version in "src/test/resources/docker-compose.yaml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a version of "2" (or "2.0") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
One part I find interesting is that the error log is showing docker/compose:1.8.0, which is an older version than the one I am currently running. I have tried changing versions in my docker-compose but that doesn't seem to work. I have looked at other questions that have the same error, and none of their solutions work. I feel like the TestContainer library uses an older version of docker-compose than I do, but if this is the issue then I do not know how to fix it.
I believe you want local compose mode:
compose =
new DockerComposeContainer(
new File("src/test/resources/docker-compose.yaml")).withLocalCompose(true)
See the local compose mode documentation for more details:
You can override Testcontainers' default behaviour and make it use a
docker-compose binary installed on the local machine. This will
generally yield an experience that is closer to running docker-compose
locally, with the caveat that Docker Compose needs to be present on
dev and CI machines.
This was the method I ultimately went with:
I used Network.newNetwork() to tie the zero and alpha instance together. I used debugging and docker logs to see the message that dgraph zero needs to wait for in order for it to start up successfully.
static {
Network network = Network.newNetwork()
dgraph_zero = new GenericContainer<>("dgraph/dgraph")
.withExposedPorts(5080)
.withNetworkAliases("zero")
.withStartupTimeout(Duration.of(1, ChronoUnit.MINUTES))
.withCommand("dgraph zero --my=zero:5080")
.withNetwork(network)
.waitingFor(Wait.forLogMessage('.* Updated Lease id: 1.*\\n',1))
dgraph_zero.start()
dgraph_alpha = new GenericContainer<>("dgraph/dgraph")
.withExposedPorts(9080)
.withStartupTimeout(Duration.of(1, ChronoUnit.MINUTES))
.withNetworkAliases("alpha")
.withCommand("dgraph alpha --my=alpha:7080 --lru_mb=2048 --zero=zero:5080")
.withNetwork(network)
.waitingFor(Wait.forLogMessage(".*Server is ready.*\\n",1))
dgraph_alpha.start()
this.address = dgraph_alpha.containerIpAddress
this.port1 = dgraph_alpha.getMappedPort(9080)
ManagedChannel channel = ManagedChannelBuilder
.forAddress(address,port1)
.usePlaintext()
.build();
DgraphGrpc.DgraphStub stub = DgraphGrpc.newStub(channel);
this.dgraphclient = new DgraphClient(stub) ;
Transaction txn = this.dgraphclient.newTransaction();

Old Kafka Offset consuming by Spark Structured Streaming after clearing Checkpointing location

I have built an application using the Apache Kafka and Apache Spark Structured streaming. I am facing the below issue.
Scenario:
I set up a Spark structured stream with a source of Kafka topic and
sink as Kafka topic.
We run the stream and produce a number of messages on the Kafka
topic.
We stopped the stream and restart stream by clearing checkpointing
location of the stream. After running for 5 to 6 hour later stream is
consuming old Kafka messages randomly.
After clearing checkpointing location I was expecting only new messages on stream.
Spark version: 2.4.0,
Kafka-client version: 2.0.0,
Kafka version: 2.0.0,
Cluster Manager: Kubernetes.
I have tried this scenario by changing the checkpointing location but the issue still persists.
{
SparkConf sparkConf = new SparkConf().setAppName("SparkKafkaConsumer");
SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate();
Dataset<Row> stream = spark
.readStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option(subscribeType, "REQUEST_TOPIC")
.option("failOnDataLoss",false)
.option("maxOffsetsPerTrigger","50")
.option("startingOffsets","latest")
.load()
.selectExpr(
"CAST(value AS STRING) as payload",
"CAST(key AS STRING)",
"CAST(topic AS STRING)",
"CAST(partition AS STRING)",
"CAST(offset AS STRING)",
"CAST(timestamp AS STRING)",
"CAST(timestampType AS STRING)");
DataStreamWriter<String> dataWriterStream = stream
.writeStream()
.format("kafka")
.option("kafka.bootstrap.servers", "localhost:9092")
.option("kafka.max.request.size", "35000000")
.option("kafka.retries", "5")
.option("kafka.batch.size", "35000000")
.option("kafka.receive.buffer.bytes", "200000000")
.option("kafka.acks","0")
.option("kafka.compression.type", "snappy")
.option("kafka.linger.ms", "0")
.option("kafka.buffer.memory", "50000000")
.option("topic", "RESPONSE_TOPIC")
.outputMode("append")
.option("checkpointLocation", checkPointDirectory);
spark.streams().awaitAnyTermination();
}
check below link,
https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-checkpointing.html
You call SparkContext.setCheckpointDir(directory: String) to set the checkpoint directory - the directory where RDDs are checkpointed. The directory must be a HDFS path if running on a cluster. The reason is that the driver may attempt to reconstruct the checkpointed RDD from its own local file system, which is incorrect because the checkpoint files are actually on the executor machines

Basics of Hector & Cassandra

I'm working with Cassandra-0.8.2.
I am working with the most recent version of Hector &
My java version is 1.6.0_26
I'm very new to Cassandra & Hector.
What I'm trying to do:
1. connect to an up & running instance of cassandra on a different server. I know it's running b/c I can ssh through my terminal into the server running this Cassandra instance and run the CLI with full functionality.
2. then I want to connect to a keyspace & create a column family and then add a value to that column family through Hector.
I think my problem is that this running instance of Cassandra on this server might not be configured to get commands that are not local. I think my next step will be to add a local instance of Cassandra on the cpu I'm working on and try to do this locally. What do you think?
Here's my Java code:
import me.prettyprint.cassandra.serializers.StringSerializer;
import me.prettyprint.cassandra.service.CassandraHostConfigurator;
import me.prettyprint.hector.api.Cluster;
import me.prettyprint.hector.api.Keyspace;
import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition;
import me.prettyprint.hector.api.ddl.ComparatorType;
import me.prettyprint.hector.api.factory.HFactory;
import me.prettyprint.hector.api.mutation.Mutator;
public class MySample {
public static void main(String[] args) {
Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", "xxx.xxx.x.41:9160");
Keyspace keyspace = HFactory.createKeyspace("apples", cluster);
ColumnFamilyDefinition cf = HFactory.createColumnFamilyDefinition("apples","ColumnFamily2",ComparatorType.UTF8TYPE);
StringSerializer stringSerializer = StringSerializer.get();
Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer);
mutator.insert("jsmith", "Standard1", HFactory.createStringColumn("first", "John"));
}
}
My ERROR is:
16:22:19,852 INFO CassandraHostRetryService:37 - Downed Host Retry service started with queue size -1 and retry delay 10s
16:22:20,136 INFO JmxMonitor:54 - Registering JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=hector,MonitorType=hector
Exception in thread "main" me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:Keyspace apples does not exist)
at me.prettyprint.cassandra.connection.HThriftClient.getCassandra(HThriftClient.java:70)
at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:226)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102)
at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108)
at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222)
at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219)
at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20)
at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85)
at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219)
at me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:59)
at org.cassandra.examples.MySample.main(MySample.java:25)
Caused by: InvalidRequestException(why:Keyspace apples does not exist)
at org.apache.cassandra.thrift.Cassandra$set_keyspace_result.read(Cassandra.java:5302)
at org.apache.cassandra.thrift.Cassandra$Client.recv_set_keyspace(Cassandra.java:481)
at org.apache.cassandra.thrift.Cassandra$Client.set_keyspace(Cassandra.java:456)
at me.prettyprint.cassandra.connection.HThriftClient.getCassandra(HThriftClient.java:68)
... 11 more
Thank you in advance for your help.
The exception you are getting is,
why:Keyspace apples does not exist
In your code, this line does not actually create the keyspace,
Keyspace keyspace = HFactory.createKeyspace("apples", cluster);
As described here, this is the code you need to define your keyspace,
ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("MyKeyspace", "ColumnFamilyName", ComparatorType.BYTESTYPE);
KeyspaceDefinition newKeyspace = HFactory.createKeyspaceDefinition("MyKeyspace", ThriftKsDef.DEF_STRATEGY_CLASS, replicationFactor, Arrays.asList(cfDef));
// Add the schema to the cluster.
// "true" as the second param means that Hector will block until all nodes see the change.
cluster.addKeyspace(newKeyspace, true);
We also have a getting started guide up on the wiki as well which might be of some help.

Categories

Resources