Unable to submit a spark job on spark cluster on docker - java
Like anticipated by the title, I have some problems to submit a spark job to a spark cluster running on docker.
I wrote a very simple spark job in scala, subscribe to a kafka server arrange some data and store these in an elastichsearch database.
kafka and elasticsearch are already running in docker.
Everything works perfectly if I run the spark job from my Ide in my dev environment (Windows / IntelliJ).
Then (and I'm not a java guy at all), I added a spark cluster following these instructions: https://github.com/big-data-europe/docker-spark
The cluster looks healthy when consulting its dashboard. I created a cluster consisting of a master and a worker.
Now, this is my job written in scala:
import java.io.Serializable
import org.apache.commons.codec.StringDecoder
import org.apache.hadoop.fs.LocalFileSystem
import org.apache.hadoop.hdfs.DistributedFileSystem
import org.apache.kafka.common.serialization.StringDeserializer
import org.apache.spark
import org.apache.spark.SparkConf
import org.elasticsearch.spark._
import org.apache.spark.sql.SQLContext
import org.apache.spark.streaming.dstream.InputDStream
import org.apache.spark.streaming.kafka010.LocationStrategies.PreferConsistent
import org.apache.spark.streaming.kafka010.{ConsumerStrategies, KafkaUtils}
import org.apache.spark.streaming.{Seconds, StreamingContext}
import scala.util.parsing.json.JSON
object KafkaConsumer {
def main(args: Array[String]): Unit = {
val sc = new SparkConf()
.setMaster("local[*]")
.setAppName("Elastic Search Indexer App")
sc.set("es.index.auto.create", "true")
val elasticResource = "iot/demo"
val ssc = new StreamingContext(sc, Seconds(10))
//ssc.checkpoint("./checkpoint")
val kafkaParams = Map(
"bootstrap.servers" -> "kafka:9092",
"key.deserializer" -> classOf[StringDeserializer],
"value.deserializer" -> classOf[StringDeserializer],
"auto.offset.reset" -> "earliest",
"group.id" -> "group0"
)
val topics = List("test")
val stream = KafkaUtils.createDirectStream(
ssc,
PreferConsistent,
ConsumerStrategies.Subscribe[String, String](topics.distinct, kafkaParams)
)
case class message(key: String, timestamp: Long, payload: Object)
val rdds = stream.map(record => message(record.key, record.timestamp, record.value))
val es_config: scala.collection.mutable.Map[String, String] =
scala.collection.mutable.Map(
"pushdown" -> "true",
"es.nodes" -> "http://docker-host",
"es.nodes.wan.only" -> "true",
"es.resource" -> elasticResource,
"es.ingest.pipeline" -> "iot-test-pipeline"
)
rdds.foreachRDD { rdd =>
rdd.saveToEs(es_config)
rdd.collect().foreach(println)
}
ssc.start()
ssc.awaitTermination()
}
}
To submit this to the cluster I did:
With "sbt-assembly" plugin, I created a fat jar file with all dependencies.
Define an assembly strategy in build.sbt to avoid deduplicate errors on merging ...
Then submit with:
./spark-submit.cmd --class KafkaConsumer --master
spark://docker-host:7077
/c/Users/shams/Documents/Appunti/iot-demo-app/spark-streaming/target/scala-2.11/
spark-streaming-assembly-1.0.jar
BUT I have this error:
19/02/27 11:18:12 WARN NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where
applicable Exception in thread "main" java.io.IOException: No
FileSystem for scheme: C
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2660)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2667)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1897)
at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:694)
at org.apache.spark.deploy.DependencyUtils$.downloadFile(DependencyUtils.scala:135)
at org.apache.spark.deploy.SparkSubmit$$anonfun$doPrepareSubmitEnvironment$7.apply(SparkSubmit.scala:416)
at org.apache.spark.deploy.SparkSubmit$$anonfun$doPrepareSubmitEnvironment$7.apply(SparkSubmit.scala:416)
at scala.Option.map(Option.scala:146)
at org.apache.spark.deploy.SparkSubmit$.doPrepareSubmitEnvironment(SparkSubmit.scala:415)
at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:250)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:171)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
After a day of trying I have not solved and I can not understand where in my work wants to access a certain volume as seems to be said by the error
Can be related with the warning message?
Then, how I should edit my script to avoid that problem?
Thanks in advance.
UPDATE:
Problem seems not related to my code because I tried tu submit a simple hello world app compiled in the same way but I have the same issue.
After many attempts and research I have come to the conclusion that the problem could be that I'm using the windows version of spark-submit from my pc to submit the job.
I could not fully understand but for now, moving the file directly to the master and worker node I was able to submit it from there.
First copy on the container:
docker cp spark-streaming-assembly-1.0.jar 21b43cb2e698:/spark/bin
Then I execute (in /spark/bin folder):
./spark-submit --class KafkaConsumer --deploy-mode cluster --master spark://spark-master:7077 spark-streaming-assembly-1.0.jar
This is the workaround that i found at the moment.
You can mount the directory of your jobs to your container by running your submit container like this
docker run -it --rm\
--name spark-submit \
--mount type=bind,source="$(pwd)"/jobs,target=/home/jobs,readonly \
--network spark-net \
-p 4040:4040 \
-p 18080:18080 \
your-spark-image \
bash
This command will mount your jobs folder directly to your container and you can change in host and those changes will automatically be present in your container
Related
Get 'Cannot read property 'replace' of undefined' error on jHipster
While running jHipster command, I got the following errors: + jhipster axon --skip-git --blueprint cst INFO! Using JHipster version installed globally INFO! No custom sharedOptions found within blueprint: generator-jhipster-cst at /usr/local/lib/node_modules/generator-jhipster-cst events.js:288 throw er; // Unhandled 'error' event ^ TypeError: Cannot read property 'replace' of undefined at new module.exports (/Users/.../jhipster/generator-jhipster-cst/generators/subgenerator-base.js:27:49) at new module.exports (/Users/.../jhipster/generator-jhipster-cst/generators/aws/index.js:3:18) at Environment.instantiate (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:673:23) at Environment.create (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:645:19) at /usr/local/lib/node_modules/generator-jhipster/cli/cli.js:74:31 at Array.forEach (<anonymous>) at Object.<anonymous> (/usr/local/lib/node_modules/generator-jhipster/cli/cli.js:62:29) at Module._compile (internal/modules/cjs/loader.js:1158:30) at Object.Module._extensions..js (internal/modules/cjs/loader.js:1178:10) at Module.load (internal/modules/cjs/loader.js:1002:32) Emitted 'error' event on Environment instance at: at Environment.error (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:293:12) at Environment.create (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:647:19) at /usr/local/lib/node_modules/generator-jhipster/cli/cli.js:74:31 [... lines matching original stack trace ...] at Module.load (internal/modules/cjs/loader.js:1002:32) at Function.Module._load (internal/modules/cjs/loader.js:901:14) at Module.require (internal/modules/cjs/loader.js:1044:19) I tried to update npm and jHipster, but there was another problem with upgrading jHipster: ~ sudo jhipster upgrade INFO! Using JHipster version installed globally INFO! Executing jhipster:upgrade This seems to be an app blueprinted project with jhipster 6.6.0 bug (https://github.com/jhipster/generator-jhipster/issues/11045), you should pass --blueprints to jhipster upgrade commmand. Error: This seems to be an app blueprinted project with jhipster 6.6.0 bug (https://github.com/jhipster/generator-jhipster/issues/11045), you should pass --blueprints to jhipster upgrade commmand. at module.exports.error (/usr/local/lib/node_modules/generator-jhipster/generators/generator-base.js:1590:15) at new module.exports (/usr/local/lib/node_modules/generator-jhipster/generators/upgrade/index.js:95:18) at Environment.instantiate (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:673:23) at Environment.create (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:645:19) at instantiateAndRun (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:729:30) at Environment.run (/usr/local/lib/node_modules/generator-jhipster/node_modules/yeoman-environment/lib/environment.js:758:12) at runYoCommand (/usr/local/lib/node_modules/generator-jhipster/cli/cli.js:53:13) at Command.<anonymous> (/usr/local/lib/node_modules/generator-jhipster/cli/cli.js:178:17) at Command.listener [as _actionHandler] (/usr/local/lib/node_modules/generator-jhipster/node_modules/commander/index.js:413:31) at Command._parseCommand (/usr/local/lib/node_modules/generator-jhipster/node_modules/commander/index.js:914:14) NPM: 6.14.8 Node: 12.16.1 jhipster: 6.10.3 Java: Tested with 13.0.2 & 11.0.8 Updated The part of the code from which the error originated ( 'replace' of undefined ): const configuration = { ...opts, ...this.getAllJhipsterConfig(this, true) }; this.baseName = configuration.baseName; this.serverPort = configuration.serverPort; this.packageName = configuration.packageName; this.rootPackageName = this.packageName.replace(/\.[^.]+$/, ''); Could you explain to me, how can I fix the above problem, please?
Janusgraph libs cant communicate with hbase in kerberos environment(Failed to specify server's Kerberos principal name)
I am getting "Failed to specify server's Kerberos principal name" when attempting to connect to habse with janusgraph in a kerberos hadoop cluster First off a little environmental info - OS: 7.6.1810 Java: 1.8.0_191-b12 Spark: 2.3.2.3.1.0.78-4 YARN: 2.5.0 Hbase: 2.0.2.3.1.0.78-4 Hadoop: 3.1.1.3.1.0.78-4 Kerberos: 5 version 1.15.1 Janusgraph: 0.4.0 I did kinit and test the bundled gremlin client to ensure the graph.properties for the env works. It was able to connect up create a simple test graph, add some vertices, restart and retrieve the stored data. So cool the bundled copy works. For laziness/simplicity I decided to load the spark-shell with janusgraph libs. While attempting to connect to the same graph it started throwing kerberos errors. First thought being maybe its a hadoop/spark lib/conf conflict(pretty typical). So built out a very simple and barebones java app in an attempt to see if it would work. Got the same errors as spark. Spark Invocations - First attempt: spark-shell \ --conf spark.driver.userClassPathFirst=true \ --conf spark.executor.userClassPathFirst=true \ --conf spark.driver.userClassPathFirst=true \ --jars /etc/hadoop/conf/core-site.xml,/etc/hbase/conf/hbase-site.xml,groovy-console-2.5.6.jar,javax.servlet-api-3.1.0.jar,netty-buffer-4.1.25.Final.jar,RoaringBitmap-0.5.11.jar,groovy-groovysh-2.5.6-indy.jar,javax.ws.rs-api-2.0.1.jar,netty-codec-4.1.25.Final.jar,activation-1.1.jar,groovy-json-2.5.6-indy.jar,jaxb-api-2.2.2.jar,netty-common-4.1.25.Final.jar,airline-0.6.jar,groovy-jsr223-2.5.6-indy.jar,jaxb-impl-2.2.3-1.jar,netty-handler-4.1.25.Final.jar,antlr-2.7.7.jar,groovy-swing-2.5.6.jar,jbcrypt-0.4.jar,netty-resolver-4.1.25.Final.jar,antlr-3.2.jar,groovy-templates-2.5.6.jar,jboss-logging-3.1.2.GA.jar,netty-transport-4.1.25.Final.jar,antlr-runtime-3.2.jar,groovy-xml-2.5.6.jar,jcabi-log-0.14.jar,noggit-0.6.jar,aopalliance-repackaged-2.4.0-b34.jar,gson-2.2.4.jar,jcabi-manifests-1.1.jar,objenesis-2.1.jar,apacheds-i18n-2.0.0-M15.jar,guava-18.0.jar,jcl-over-slf4j-1.7.25.jar,ohc-core-0.3.4.jar,apacheds-kerberos-codec-2.0.0-M15.jar,hadoop-annotations-2.7.7.jar,je-7.5.11.jar,org.apache.servicemix.bundles.commons-csv-1.0-r706900_3.jar,api-asn1-api-1.0.0-M20.jar,hadoop-auth-2.7.7.jar,jersey-client-1.9.jar,oro-2.0.8.jar,api-util-1.0.0-M20.jar,hadoop-client-2.7.7.jar,jersey-client-2.22.2.jar,osgi-resource-locator-1.0.1.jar,asm-3.1.jar,hadoop-common-2.7.7.jar,jersey-common-2.22.2.jar,paranamer-2.6.jar,asm-5.0.3.jar,hadoop-distcp-2.7.7.jar,jersey-container-servlet-2.22.2.jar,picocli-3.9.2.jar,asm-analysis-5.0.3.jar,hadoop-gremlin-3.4.1.jar,jersey-container-servlet-core-2.22.2.jar,protobuf-java-2.5.0.jar,asm-commons-5.0.3.jar,hadoop-hdfs-2.7.7.jar,jersey-core-1.9.jar,py4j-0.10.7.jar,asm-tree-5.0.3.jar,hadoop-mapreduce-client-app-2.7.7.jar,jersey-guava-2.22.2.jar,pyrolite-4.13.jar,asm-util-5.0.3.jar,hadoop-mapreduce-client-common-2.7.7.jar,jersey-json-1.9.jar,reflections-0.9.9-RC1.jar,astyanax-cassandra-3.10.2.jar,hadoop-mapreduce-client-core-2.7.7.jar,jersey-media-jaxb-2.22.2.jar,reporter-config-base-3.0.0.jar,astyanax-cassandra-all-shaded-3.10.2.jar,hadoop-mapreduce-client-jobclient-2.7.7.jar,jersey-server-1.9.jar,reporter-config3-3.0.0.jar,astyanax-core-3.10.2.jar,hadoop-mapreduce-client-shuffle-2.7.7.jar,jersey-server-2.22.2.jar,scala-library-2.11.8.jar,astyanax-recipes-3.10.2.jar,hadoop-yarn-api-2.7.7.jar,jets3t-0.7.1.jar,scala-reflect-2.11.8.jar,astyanax-thrift-3.10.2.jar,hadoop-yarn-client-2.7.7.jar,jettison-1.3.3.jar,scala-xml_2.11-1.0.5.jar,audience-annotations-0.5.0.jar,hadoop-yarn-common-2.7.7.jar,jetty-6.1.26.jar,servlet-api-2.5.jar,avro-1.7.4.jar,hadoop-yarn-server-common-2.7.7.jar,jetty-sslengine-6.1.26.jar,sesame-model-2.7.10.jar,avro-ipc-1.8.2.jar,hamcrest-core-1.3.jar,jetty-util-6.1.26.jar,sesame-rio-api-2.7.10.jar,avro-mapred-1.8.2-hadoop2.jar,hbase-shaded-client-2.1.5.jar,jffi-1.2.16-native.jar,sesame-rio-datatypes-2.7.10.jar,bigtable-hbase-1.x-shaded-1.11.0.jar,hbase-shaded-mapreduce-2.1.5.jar,jffi-1.2.16.jar,sesame-rio-languages-2.7.10.jar,caffeine-2.3.1.jar,hibernate-validator-4.3.0.Final.jar,jline-2.14.6.jar,sesame-rio-n3-2.7.10.jar,cassandra-all-2.2.13.jar,high-scale-lib-1.0.6.jar,jna-4.0.0.jar,sesame-rio-ntriples-2.7.10.jar,cassandra-driver-core-3.7.1.jar,high-scale-lib-1.1.4.jar,jnr-constants-0.9.9.jar,sesame-rio-rdfxml-2.7.10.jar,cassandra-thrift-2.2.13.jar,hk2-api-2.4.0-b34.jar,jnr-ffi-2.1.7.jar,sesame-rio-trig-2.7.10.jar,checker-compat-qual-2.5.2.jar,hk2-locator-2.4.0-b34.jar,jnr-posix-3.0.44.jar,sesame-rio-trix-2.7.10.jar,chill-java-0.9.3.jar,hk2-utils-2.4.0-b34.jar,jnr-x86asm-1.0.2.jar,sesame-rio-turtle-2.7.10.jar,chill_2.11-0.9.3.jar,hppc-0.7.1.jar,joda-time-2.8.2.jar,sesame-util-2.7.10.jar,commons-cli-1.3.1.jar,htrace-core-3.1.0-incubating.jar,jsch-0.1.54.jar,sigar-1.6.4.jar,commons-codec-1.7.jar,htrace-core4-4.2.0-incubating.jar,json-20090211_1.jar,slf4j-api-1.7.12.jar,commons-collections-3.2.2.jar,httpasyncclient-4.1.2.jar,json-simple-1.1.jar,slf4j-log4j12-1.7.12.jar,commons-configuration-1.10.jar,httpclient-4.4.1.jar,json4s-ast_2.11-3.5.3.jar,snakeyaml-1.11.jar,commons-crypto-1.0.0.jar,httpcore-4.4.1.jar,json4s-core_2.11-3.5.3.jar,snappy-java-1.0.5-M3.jar,commons-httpclient-3.1.jar,httpcore-nio-4.4.5.jar,json4s-jackson_2.11-3.5.3.jar,solr-solrj-7.0.0.jar,commons-io-2.3.jar,httpmime-4.4.1.jar,json4s-scalap_2.11-3.5.3.jar,spark-core_2.11-2.4.0.jar,commons-lang-2.5.jar,ivy-2.3.0.jar,jsp-api-2.1.jar,spark-gremlin-3.4.1.jar,commons-lang3-3.3.1.jar,jackson-annotations-2.6.6.jar,jsr305-3.0.0.jar,spark-kvstore_2.11-2.4.0.jar,commons-logging-1.1.1.jar,jackson-core-2.6.6.jar,jts-core-1.15.0.jar,spark-launcher_2.11-2.4.0.jar,commons-math3-3.2.jar,jackson-core-asl-1.9.13.jar,jul-to-slf4j-1.7.16.jar,spark-network-common_2.11-2.4.0.jar,commons-net-1.4.1.jar,jackson-databind-2.6.6.jar,junit-4.12.jar,spark-network-shuffle_2.11-2.4.0.jar,commons-pool-1.6.jar,jackson-datatype-json-org-2.6.6.jar,kryo-shaded-4.0.2.jar,spark-tags_2.11-2.4.0.jar,commons-text-1.0.jar,jackson-jaxrs-1.9.13.jar,leveldbjni-all-1.8.jar,spark-unsafe_2.11-2.4.0.jar,compress-lzf-1.0.0.jar,jackson-mapper-asl-1.9.13.jar,libthrift-0.9.2.jar,spatial4j-0.7.jar,concurrentlinkedhashmap-lru-1.3.jar,jackson-module-paranamer-2.6.6.jar,log4j-1.2.16.jar,stax-api-1.0-2.jar,crc32ex-0.1.1.jar,jackson-module-scala_2.11-2.6.6.jar,logback-classic-1.1.3.jar,stax-api-1.0.1.jar,curator-client-2.7.1.jar,jackson-xc-1.9.13.jar,logback-core-1.1.3.jar,stax2-api-3.1.4.jar,curator-framework-2.7.1.jar,jamm-0.3.0.jar,lucene-analyzers-common-7.0.0.jar,stream-2.7.0.jar,curator-recipes-2.7.1.jar,janusgraph-all-0.4.0.jar,lucene-core-7.0.0.jar,stringtemplate-3.2.jar,disruptor-3.0.1.jar,janusgraph-berkeleyje-0.4.0.jar,lucene-queries-7.0.0.jar,super-csv-2.1.0.jar,dom4j-1.6.1.jar,janusgraph-bigtable-0.4.0.jar,lucene-queryparser-7.0.0.jar,thrift-server-0.3.7.jar,ecj-4.4.2.jar,janusgraph-cassandra-0.4.0.jar,lucene-sandbox-7.0.0.jar,tinkergraph-gremlin-3.4.1.jar,elasticsearch-rest-client-6.6.0.jar,janusgraph-core-0.4.0.jar,lucene-spatial-7.0.0.jar,unused-1.0.0.jar,exp4j-0.4.8.jar,janusgraph-cql-0.4.0.jar,lucene-spatial-extras-7.0.0.jar,uuid-3.2.jar,findbugs-annotations-1.3.9-1.jar,janusgraph-es-0.4.0.jar,lucene-spatial3d-7.0.0.jar,validation-api-1.1.0.Final.jar,gbench-0.4.3-groovy-2.4.jar,janusgraph-hadoop-0.4.0.jar,lz4-1.3.0.jar,vavr-0.9.0.jar,gmetric4j-1.0.7.jar,janusgraph-hbase-0.4.0.jar,lz4-java-1.4.0.jar,vavr-match-0.9.0.jar,gprof-0.3.1-groovy-2.4.jar,janusgraph-lucene-0.4.0.jar,metrics-core-3.0.2.jar,woodstox-core-asl-4.4.1.jar,gremlin-console-3.4.1.jar,janusgraph-server-0.4.0.jar,metrics-core-3.2.2.jar,xbean-asm6-shaded-4.8.jar,gremlin-core-3.4.1.jar,janusgraph-solr-0.4.0.jar,metrics-ganglia-3.2.2.jar,xercesImpl-2.9.1.jar,gremlin-driver-3.4.1.jar,javapoet-1.8.0.jar,metrics-graphite-3.2.2.jar,xml-apis-1.3.04.jar,gremlin-groovy-3.4.1.jar,javassist-3.18.0-GA.jar,metrics-json-3.1.5.jar,xmlenc-0.52.jar,gremlin-server-3.4.1.jar,javatuples-1.2.jar,metrics-jvm-3.2.2.jar,zookeeper-3.4.6.jar,gremlin-shaded-3.4.1.jar,javax.inject-1.jar,minlog-1.3.0.jar,zstd-jni-1.3.2-2.jar,groovy-2.5.6-indy.jar,javax.inject-2.4.0-b34.jar,netty-3.10.5.Final.jar,groovy-cli-picocli-2.5.6.jar,javax.json-1.0.jar,netty-all-4.1.25.Final.jar Second attempt(less libs): spark-shell \ --conf spark.driver.userClassPathFirst=true \ --conf spark.executor.userClassPathFirst=true \ --conf spark.driver.userClassPathFirst=true \ --jars /etc/hadoop/conf/core-site.xml,/etc/hbase/conf/hbase-site.xml,gremlin-core-3.4.1.jar,gremlin-driver-3.4.3.jar,gremlin-shaded-3.4.1.jar,groovy-2.5.7.jar,groovy-json-2.5.7.jar,javatuples-1.2.jar,commons-lang3-3.8.1.jar,commons-configuration-1.10.jar,janusgraph-core-0.4.0.jar,hbase-shaded-client-2.1.5.jar,janusgraph-hbase-0.4.0.jar,high-scale-lib-1.1.4.jar Java attempt: java \ -cp /etc/hadoop/conf/core-site.xml:/etc/hbase/conf/hbase-site.xml:hbase-shaded-client-2.1.5.jar:janusgraph-hbase-0.4.0.jar:janusgraph-core-0.4.0.jar:commons-lang3-3.8.1.jar:gremlin-driver-3.4.3.jar:groovy-2.5.7.jar:javatuples-1.2.jar:commons-configuration-1.10.jar:gremlin-core-3.4.1.jar:gremlin-shaded-3.4.1.jar:groovy-json-2.5.7.jar:high-scale-lib-1.1.4.jar:Janusgraph_Ingestion.jar:../janusgraph-0.4.0-hadoop2/lib/commons-lang-2.5.jar:../janusgraph-0.4.0-hadoop2/lib/slf4j-api-1.7.12.jar:../janusgraph-0.4.0-hadoop2/lib/slf4j-log4j12-1.7.12.jar:../janusgraph-0.4.0-hadoop2/lib/log4j-1.2.16.jar:../janusgraph-0.4.0-hadoop2/lib/guava-18.0.jar:../janusgraph-0.4.0-hadoop2/lib/commons-logging-1.1.1.jar:../janusgraph-0.4.0-hadoop2/lib/commons-io-2.3.jar:../janusgraph-0.4.0-hadoop2/lib/htrace-core4-4.2.0-incubating.jar \ Entry As far as code being executed in the spark-shell or java import org.janusgraph.core.JanusGraphFactory; val g = JanusGraphFactory.open("/home/devuser/janusgraph-0.4.0-hadoop2/conf/janusgraph-hbase.properties").traversal() Also tried adding the below before attempting to open the graph import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.security.UserGroupInformation; val conf = new Configuration(); conf.set("hadoop.security.authentication", "Kerberos"); UserGroupInformation.setConfiguration(conf); UserGroupInformation.loginUserFromSubject(null); Including graph connect config for completeness gremlin.graph=org.janusgraph.core.JanusGraphFactory storage.backend=hbase storage.hostname=hosta.example.com:2181,hostb.example.com:2181,hostc.example.com:2181 storage.hbase.table=JgraphTest storage.hbase.ext.zookeeper.znode.parent=/hbase-secure storage.batch-loading=false java.security.krb5.conf=/etc/krb5.conf storage.hbase.ext.hbase.security.authentication=kerberos storage.hbase.ext.hbase.security.authorization=true storage.hbase.ext.hadoop.security.authentication=kerberos storage.hbase.ext.hadoop.security.authorization=true storage.hbase.ext.hbase.regionserver.kerberos.principal=hbase/_HOST#HDPDEV.example.com ids.block-size=10000 ids.renew-timeout=3600000 storage.buffer-size=10000 ids.num-partitions=10 ids.partition=true schema.default=none cache.db-cache = true cache.db-cache-clean-wait = 20 cache.db-cache-time = 180000 cache.db-cache-size = 0.5 Expected result would be a usable traversal object Actual result below 19/10/18 11:40:30 TRACE NettyRpcConnection: Connecting to hostb.example.com/192.168.1.101:16000 19/10/18 11:40:30 DEBUG AbstractHBaseSaslRpcClient: Creating SASL GSSAPI client. Server's Kerberos principal name is null 19/10/18 11:40:30 TRACE AbstractRpcClient: Call: IsMasterRunning, callTime: 4ms 19/10/18 11:40:30 DEBUG RpcRetryingCallerImpl: Call exception, tries=7, retries=16, started=8197 ms ago, cancelled=false, msg=java.io.IOException: Call to hostb.example.com/192.168.1.101:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name, details=, see https://s.apache.org/timeout, exception=org.apache.hadoop.hbase.MasterNotRunningException: java.io.IOException: Call to hostb.example.com/192.168.1.101:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name at org.apache.hadoop.hbase.client.ConnectionImplementation$MasterServiceStubMaker.makeStub(ConnectionImplementation.java:1175) at org.apache.hadoop.hbase.client.ConnectionImplementation.getKeepAliveMasterService(ConnectionImplementation.java:1234) at org.apache.hadoop.hbase.client.ConnectionImplementation.getMaster(ConnectionImplementation.java:1223) at org.apache.hadoop.hbase.client.MasterCallable.prepare(MasterCallable.java:57) at org.apache.hadoop.hbase.client.RpcRetryingCallerImpl.callWithRetries(RpcRetryingCallerImpl.java:105) at org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3089) at org.apache.hadoop.hbase.client.HBaseAdmin.getHTableDescriptor(HBaseAdmin.java:569) at org.apache.hadoop.hbase.client.HBaseAdmin.getTableDescriptor(HBaseAdmin.java:529) at org.janusgraph.diskstorage.hbase.HBaseAdmin1_0.getTableDescriptor(HBaseAdmin1_0.java:105) at org.janusgraph.diskstorage.hbase.HBaseStoreManager.ensureTableExists(HBaseStoreManager.java:726) at org.janusgraph.diskstorage.hbase.HBaseStoreManager.getLocalKeyPartition(HBaseStoreManager.java:537) at org.janusgraph.diskstorage.hbase.HBaseStoreManager.getDeployment(HBaseStoreManager.java:376) at org.janusgraph.diskstorage.hbase.HBaseStoreManager.getFeatures(HBaseStoreManager.java:418) at org.janusgraph.graphdb.configuration.builder.GraphDatabaseConfigurationBuilder.build(GraphDatabaseConfigurationBuilder.java:51) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:161) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:132) at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:79) at $line22.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:26) at $line22.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:31) at $line22.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(<console>:33) at $line22.$read$$iw$$iw$$iw$$iw$$iw.<init>(<console>:35) at $line22.$read$$iw$$iw$$iw$$iw.<init>(<console>:37) at $line22.$read$$iw$$iw$$iw.<init>(<console>:39) at $line22.$read$$iw$$iw.<init>(<console>:41) at $line22.$read$$iw.<init>(<console>:43) at $line22.$read.<init>(<console>:45) at $line22.$read$.<init>(<console>:49) at $line22.$read$.<clinit>(<console>) at $line22.$eval$.$print$lzycompute(<console>:7) at $line22.$eval$.$print(<console>:6) at $line22.$eval.$print(<console>) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:786) at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:1047) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:638) at scala.tools.nsc.interpreter.IMain$WrappedRequest$$anonfun$loadAndRunReq$1.apply(IMain.scala:637) at scala.reflect.internal.util.ScalaClassLoader$class.asContext(ScalaClassLoader.scala:31) at scala.reflect.internal.util.AbstractFileClassLoader.asContext(AbstractFileClassLoader.scala:19) at scala.tools.nsc.interpreter.IMain$WrappedRequest.loadAndRunReq(IMain.scala:637) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:569) at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:565) at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:807) at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:681) at scala.tools.nsc.interpreter.ILoop.processLine(ILoop.scala:395) at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:415) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:923) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:909) at scala.reflect.internal.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:97) at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:909) at org.apache.spark.repl.Main$.doMain(Main.scala:76) at org.apache.spark.repl.Main$.main(Main.scala:56) at org.apache.spark.repl.Main.main(Main.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:904) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:198) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:228) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:137) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.io.IOException: Call to hostb.example.com/192.168.1.101:16000 failed on local exception: java.io.IOException: Failed to specify server's Kerberos principal name at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:423) at org.apache.hadoop.hbase.ipc.IPCUtil.wrapException(IPCUtil.java:221) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.onCallFinished(AbstractRpcClient.java:390) at org.apache.hadoop.hbase.ipc.AbstractRpcClient.access$100(AbstractRpcClient.java:95) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:410) at org.apache.hadoop.hbase.ipc.AbstractRpcClient$3.run(AbstractRpcClient.java:406) at org.apache.hadoop.hbase.ipc.Call.callComplete(Call.java:103) at org.apache.hadoop.hbase.ipc.Call.setException(Call.java:118) at org.apache.hadoop.hbase.ipc.BufferCallBeforeInitHandler.userEventTriggered(BufferCallBeforeInitHandler.java:92) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1377) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) at org.apache.hbase.thirdparty.io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:929) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.failInit(NettyRpcConnection.java:179) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.saslNegotiate(NettyRpcConnection.java:197) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.access$800(NettyRpcConnection.java:71) at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:273) at org.apache.hadoop.hbase.ipc.NettyRpcConnection$3.operationComplete(NettyRpcConnection.java:261) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners0(DefaultPromise.java:500) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:479) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104) at org.apache.hbase.thirdparty.io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.fulfillConnectPromise(AbstractNioChannel.java:306) at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:341) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:633) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497) at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459) at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858) at org.apache.hbase.thirdparty.io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed to specify server's Kerberos principal name at org.apache.hadoop.hbase.security.AbstractHBaseSaslRpcClient.<init>(AbstractHBaseSaslRpcClient.java:99) at org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClient.<init>(NettyHBaseSaslRpcClient.java:43) at org.apache.hadoop.hbase.security.NettyHBaseSaslRpcClientHandler.<init>(NettyHBaseSaslRpcClientHandler.java:70) at org.apache.hadoop.hbase.ipc.NettyRpcConnection.saslNegotiate(NettyRpcConnection.java:194) ... 18 more
Well I feel like an idiot. Sooo apparently the answer was actually a really simple deal. Appears that the gremlin client will work fine when just using storage.hbase.ext.hbase.regionserver.kerberos.principal but when using the libs out side of that storage.hbase.ext.hbase.master.kerberos.principal is needed as well. Well as far as this things are working on to the next set of problems I made for myself lol.
Docker-Compose version is unsupported
I'm using TestContainers to run dgraph. Here is my test code: package net.dgraph.java.client import io.dgraph.DgraphAsyncClient import io.dgraph.DgraphClient import org.testcontainers.containers.DockerComposeContainer import org.testcontainers.containers.GenericContainer import org.testcontainers.spock.Testcontainers import spock.lang.Shared import spock.lang.Specification import java.time.Duration import java.time.temporal.ChronoUnit #Testcontainers public class DGraphTest extends Specification { private SyncSigmaDgraphClient syncClient private AsyncSigmaDGraphClient asyncClient private static address static DockerComposeContainer compose def setup() { syncClient = SigmaDgraphClientBuilder .create() .withHost(address) .withPort(port1) .buildSync() } static { compose = new DockerComposeContainer( new File("src/test/resources/docker-compose.yaml")) compose.start() this.address = compose.getServiceHost("dgraph", 8080) this.port1 = compose.getServicePort("dgraph",8080) } And my docker-compose.yaml file looks like: version: "3.2" services: zero: image: dgraph/dgraph:latest volumes: - /tmp/data:/dgraph ports: - 5080:5080 - 6080:6080 restart: on-failure command: dgraph zero --my=zero:5080 alpha: image: dgraph/dgraph:latest volumes: - /tmp/data:/dgraph ports: - 8080:8080 - 9080:9080 restart: on-failure command: dgraph alpha --my=alpha:7080 --lru_mb=2048 --zero=zero:5080 ratel: image: dgraph/dgraph:latest ports: - 8000:8000 command: dgraph-ratel My docker version is Docker version 19.03.2, build 6a30dfc and my docker-compose version is docker-compose version 1.24.1, build 4667896b . However I get the following error: [main] ERROR 🐳 [docker/compose:1.8.0] - Log output from the failed container: Version in "src/test/resources/docker-compose.yaml" is unsupported. You might be seeing this error because you're using the wrong Compose file version. Either specify a version of "2" (or "2.0") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1. One part I find interesting is that the error log is showing docker/compose:1.8.0, which is an older version than the one I am currently running. I have tried changing versions in my docker-compose but that doesn't seem to work. I have looked at other questions that have the same error, and none of their solutions work. I feel like the TestContainer library uses an older version of docker-compose than I do, but if this is the issue then I do not know how to fix it.
I believe you want local compose mode: compose = new DockerComposeContainer( new File("src/test/resources/docker-compose.yaml")).withLocalCompose(true) See the local compose mode documentation for more details: You can override Testcontainers' default behaviour and make it use a docker-compose binary installed on the local machine. This will generally yield an experience that is closer to running docker-compose locally, with the caveat that Docker Compose needs to be present on dev and CI machines.
This was the method I ultimately went with: I used Network.newNetwork() to tie the zero and alpha instance together. I used debugging and docker logs to see the message that dgraph zero needs to wait for in order for it to start up successfully. static { Network network = Network.newNetwork() dgraph_zero = new GenericContainer<>("dgraph/dgraph") .withExposedPorts(5080) .withNetworkAliases("zero") .withStartupTimeout(Duration.of(1, ChronoUnit.MINUTES)) .withCommand("dgraph zero --my=zero:5080") .withNetwork(network) .waitingFor(Wait.forLogMessage('.* Updated Lease id: 1.*\\n',1)) dgraph_zero.start() dgraph_alpha = new GenericContainer<>("dgraph/dgraph") .withExposedPorts(9080) .withStartupTimeout(Duration.of(1, ChronoUnit.MINUTES)) .withNetworkAliases("alpha") .withCommand("dgraph alpha --my=alpha:7080 --lru_mb=2048 --zero=zero:5080") .withNetwork(network) .waitingFor(Wait.forLogMessage(".*Server is ready.*\\n",1)) dgraph_alpha.start() this.address = dgraph_alpha.containerIpAddress this.port1 = dgraph_alpha.getMappedPort(9080) ManagedChannel channel = ManagedChannelBuilder .forAddress(address,port1) .usePlaintext() .build(); DgraphGrpc.DgraphStub stub = DgraphGrpc.newStub(channel); this.dgraphclient = new DgraphClient(stub) ; Transaction txn = this.dgraphclient.newTransaction();
Old Kafka Offset consuming by Spark Structured Streaming after clearing Checkpointing location
I have built an application using the Apache Kafka and Apache Spark Structured streaming. I am facing the below issue. Scenario: I set up a Spark structured stream with a source of Kafka topic and sink as Kafka topic. We run the stream and produce a number of messages on the Kafka topic. We stopped the stream and restart stream by clearing checkpointing location of the stream. After running for 5 to 6 hour later stream is consuming old Kafka messages randomly. After clearing checkpointing location I was expecting only new messages on stream. Spark version: 2.4.0, Kafka-client version: 2.0.0, Kafka version: 2.0.0, Cluster Manager: Kubernetes. I have tried this scenario by changing the checkpointing location but the issue still persists. { SparkConf sparkConf = new SparkConf().setAppName("SparkKafkaConsumer"); SparkSession spark = SparkSession.builder().config(sparkConf).getOrCreate(); Dataset<Row> stream = spark .readStream() .format("kafka") .option("kafka.bootstrap.servers", "localhost:9092") .option(subscribeType, "REQUEST_TOPIC") .option("failOnDataLoss",false) .option("maxOffsetsPerTrigger","50") .option("startingOffsets","latest") .load() .selectExpr( "CAST(value AS STRING) as payload", "CAST(key AS STRING)", "CAST(topic AS STRING)", "CAST(partition AS STRING)", "CAST(offset AS STRING)", "CAST(timestamp AS STRING)", "CAST(timestampType AS STRING)"); DataStreamWriter<String> dataWriterStream = stream .writeStream() .format("kafka") .option("kafka.bootstrap.servers", "localhost:9092") .option("kafka.max.request.size", "35000000") .option("kafka.retries", "5") .option("kafka.batch.size", "35000000") .option("kafka.receive.buffer.bytes", "200000000") .option("kafka.acks","0") .option("kafka.compression.type", "snappy") .option("kafka.linger.ms", "0") .option("kafka.buffer.memory", "50000000") .option("topic", "RESPONSE_TOPIC") .outputMode("append") .option("checkpointLocation", checkPointDirectory); spark.streams().awaitAnyTermination(); }
check below link, https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-checkpointing.html You call SparkContext.setCheckpointDir(directory: String) to set the checkpoint directory - the directory where RDDs are checkpointed. The directory must be a HDFS path if running on a cluster. The reason is that the driver may attempt to reconstruct the checkpointed RDD from its own local file system, which is incorrect because the checkpoint files are actually on the executor machines
Basics of Hector & Cassandra
I'm working with Cassandra-0.8.2. I am working with the most recent version of Hector & My java version is 1.6.0_26 I'm very new to Cassandra & Hector. What I'm trying to do: 1. connect to an up & running instance of cassandra on a different server. I know it's running b/c I can ssh through my terminal into the server running this Cassandra instance and run the CLI with full functionality. 2. then I want to connect to a keyspace & create a column family and then add a value to that column family through Hector. I think my problem is that this running instance of Cassandra on this server might not be configured to get commands that are not local. I think my next step will be to add a local instance of Cassandra on the cpu I'm working on and try to do this locally. What do you think? Here's my Java code: import me.prettyprint.cassandra.serializers.StringSerializer; import me.prettyprint.cassandra.service.CassandraHostConfigurator; import me.prettyprint.hector.api.Cluster; import me.prettyprint.hector.api.Keyspace; import me.prettyprint.hector.api.ddl.ColumnFamilyDefinition; import me.prettyprint.hector.api.ddl.ComparatorType; import me.prettyprint.hector.api.factory.HFactory; import me.prettyprint.hector.api.mutation.Mutator; public class MySample { public static void main(String[] args) { Cluster cluster = HFactory.getOrCreateCluster("Test Cluster", "xxx.xxx.x.41:9160"); Keyspace keyspace = HFactory.createKeyspace("apples", cluster); ColumnFamilyDefinition cf = HFactory.createColumnFamilyDefinition("apples","ColumnFamily2",ComparatorType.UTF8TYPE); StringSerializer stringSerializer = StringSerializer.get(); Mutator<String> mutator = HFactory.createMutator(keyspace, stringSerializer); mutator.insert("jsmith", "Standard1", HFactory.createStringColumn("first", "John")); } } My ERROR is: 16:22:19,852 INFO CassandraHostRetryService:37 - Downed Host Retry service started with queue size -1 and retry delay 10s 16:22:20,136 INFO JmxMonitor:54 - Registering JMX me.prettyprint.cassandra.service_Test Cluster:ServiceType=hector,MonitorType=hector Exception in thread "main" me.prettyprint.hector.api.exceptions.HInvalidRequestException: InvalidRequestException(why:Keyspace apples does not exist) at me.prettyprint.cassandra.connection.HThriftClient.getCassandra(HThriftClient.java:70) at me.prettyprint.cassandra.connection.HConnectionManager.operateWithFailover(HConnectionManager.java:226) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.operateWithFailover(KeyspaceServiceImpl.java:131) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:102) at me.prettyprint.cassandra.service.KeyspaceServiceImpl.batchMutate(KeyspaceServiceImpl.java:108) at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:222) at me.prettyprint.cassandra.model.MutatorImpl$3.doInKeyspace(MutatorImpl.java:219) at me.prettyprint.cassandra.model.KeyspaceOperationCallback.doInKeyspaceAndMeasure(KeyspaceOperationCallback.java:20) at me.prettyprint.cassandra.model.ExecutingKeyspace.doExecute(ExecutingKeyspace.java:85) at me.prettyprint.cassandra.model.MutatorImpl.execute(MutatorImpl.java:219) at me.prettyprint.cassandra.model.MutatorImpl.insert(MutatorImpl.java:59) at org.cassandra.examples.MySample.main(MySample.java:25) Caused by: InvalidRequestException(why:Keyspace apples does not exist) at org.apache.cassandra.thrift.Cassandra$set_keyspace_result.read(Cassandra.java:5302) at org.apache.cassandra.thrift.Cassandra$Client.recv_set_keyspace(Cassandra.java:481) at org.apache.cassandra.thrift.Cassandra$Client.set_keyspace(Cassandra.java:456) at me.prettyprint.cassandra.connection.HThriftClient.getCassandra(HThriftClient.java:68) ... 11 more Thank you in advance for your help.
The exception you are getting is, why:Keyspace apples does not exist In your code, this line does not actually create the keyspace, Keyspace keyspace = HFactory.createKeyspace("apples", cluster); As described here, this is the code you need to define your keyspace, ColumnFamilyDefinition cfDef = HFactory.createColumnFamilyDefinition("MyKeyspace", "ColumnFamilyName", ComparatorType.BYTESTYPE); KeyspaceDefinition newKeyspace = HFactory.createKeyspaceDefinition("MyKeyspace", ThriftKsDef.DEF_STRATEGY_CLASS, replicationFactor, Arrays.asList(cfDef)); // Add the schema to the cluster. // "true" as the second param means that Hector will block until all nodes see the change. cluster.addKeyspace(newKeyspace, true);
We also have a getting started guide up on the wiki as well which might be of some help.