I'm trying to run an Apache Beam application on a Flink cluster, but it fails with an error translating the Kafka UnboundedSource, saying that [partitions type:ARRAY pos:0] is not serializable. The application is a word count example reading from a Kafka topic and publishing to a Kafka topic, and it works fine using Beam's direct runner.
I created a pom.xml by following Beam's QuickStart Java and then added the KafkaIO sdk. I'm running a single-node local Flink 1.8.1 cluster and Kafka 2.3.0.
pom.xml snippets
<properties>
<beam.version>2.14.0</beam.version>
<flink.artifact.name>beam-runners-flink-1.8</flink.artifact.name>
<flink.version>1.8.1</flink.version>
</properties>
...
<profile>
<id>flink-runner</id>
<!-- Makes the FlinkRunner available when running a pipeline. -->
<dependencies>
<dependency>
<groupId>org.apache.beam</groupId>
<!-- Please see the Flink Runner page for an up-to-date list
of supported Flink versions and their artifact names:
https://beam.apache.org/documentation/runners/flink/ -->
<artifactId>${flink.artifact.name}</artifactId>
<version>${beam.version}</version>
<scope>runtime</scope>
</dependency>
<!-- Tried with and without this flink-avro dependency -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-avro</artifactId>
<version>${flink.version}</version>
</dependency>
</dependencies>
</profile>
...
<dependency>
<groupId>org.apache.beam</groupId>
<artifactId>beam-sdks-java-io-kafka</artifactId>
<version>${beam.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>2.3.0</version>
</dependency>
KafkaWordCount.java snippet
// Create the Pipeline object with the options we defined above.
Pipeline p = Pipeline.create(options);
PCollection<KV<String, Long>> counts = p.apply(KafkaIO.<String, String>read()
.withBootstrapServers(options.getBootstrapServer())
.withTopics(Collections.singletonList(options.getInputTopic()))
.withKeyDeserializer(StringDeserializer.class)
.withValueDeserializer(StringDeserializer.class)
.updateConsumerProperties(ImmutableMap.of("auto.offset.reset", (Object)"latest"))
.withoutMetadata() // PCollection<KV<Long, String>> instead of KafkaRecord type
)
The full error message, which is the result of submitting the Beam jar to Flink via /opt/flink/bin/flink run -c org.apache.beam.examples.KafkaWordCount target/word-count-beam-bundled-0.1.jar --runner=FlinkRunner --bootstrapServer=localhost:9092
The program finished with the following exception:
org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Error while translating UnboundedSource: org.apache.beam.sdk.io.kafka.KafkaUnboundedSource#65be88ae
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:546)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:423)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
at org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
at org.apache.flink.runtime.security.NoOpSecurityContext.runSecured(NoOpSecurityContext.java:30)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
Caused by: java.lang.RuntimeException: Error while translating UnboundedSource: org.apache.beam.sdk.io.kafka.KafkaUnboundedSource#65be88ae
at org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$UnboundedReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:233)
at org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$ReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:281)
at org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.applyStreamingTransform(FlinkStreamingPipelineTranslator.java:157)
at org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.visitPrimitiveTransform(FlinkStreamingPipelineTranslator.java:136)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:665)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.visit(TransformHierarchy.java:657)
at org.apache.beam.sdk.runners.TransformHierarchy$Node.access$600(TransformHierarchy.java:317)
at org.apache.beam.sdk.runners.TransformHierarchy.visit(TransformHierarchy.java:251)
at org.apache.beam.sdk.Pipeline.traverseTopologically(Pipeline.java:458)
at org.apache.beam.runners.flink.FlinkPipelineTranslator.translate(FlinkPipelineTranslator.java:38)
at org.apache.beam.runners.flink.FlinkStreamingPipelineTranslator.translate(FlinkStreamingPipelineTranslator.java:88)
at org.apache.beam.runners.flink.FlinkPipelineExecutionEnvironment.translate(FlinkPipelineExecutionEnvironment.java:116)
at org.apache.beam.runners.flink.FlinkRunner.run(FlinkRunner.java:108)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:313)
at org.apache.beam.sdk.Pipeline.run(Pipeline.java:299)
at org.apache.beam.examples.KafkaWordCount.runWordCount(KafkaWordCount.java:99)
at org.apache.beam.examples.KafkaWordCount.main(KafkaWordCount.java:106)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
... 9 more
Caused by: org.apache.flink.api.common.InvalidProgramException: [partitions type:ARRAY pos:0] is not serializable. The object probably contains or references non serializable fields.
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:140)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:115)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.clean(StreamExecutionEnvironment.java:1558)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1470)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1414)
at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.addSource(StreamExecutionEnvironment.java:1396)
at org.apache.beam.runners.flink.FlinkStreamingTransformTranslators$UnboundedReadSourceTranslator.translateNode(FlinkStreamingTransformTranslators.java:218)
... 32 more
Caused by: java.io.NotSerializableException: org.apache.avro.Schema$Field
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1185)
at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349)
at java.base/java.util.ArrayList.writeObject(ArrayList.java:896)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at java.base/java.io.ObjectStreamClass.invokeWriteObject(ObjectStreamClass.java:1130)
at java.base/java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1497)
at java.base/java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1433)
at java.base/java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1179)
at java.base/java.io.ObjectOutputStream.writeObject(ObjectOutputStream.java:349)
at org.apache.flink.util.InstantiationUtil.serializeObject(InstantiationUtil.java:576)
at org.apache.flink.api.java.ClosureCleaner.clean(ClosureCleaner.java:122)
... 42 more
UPDATE
Turns out there is an issue in Beam related to running on Flink that seems to be related to this: https://issues.apache.org/jira/browse/BEAM-7478. One of the comments on it specifically mentions that using flink/run with KafkaIO doesn't work due to Avro's Schema.Field not being serializable: https://issues.apache.org/jira/browse/BEAM-7478?focusedCommentId=16902419&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16902419
UPDATE 2
As mentioned in the comments, a workaround is to downgrade to Flink to 1.8.0.
Related
I introduced the spring-boot-starter-web, used its own spring-boot-starter-logging framework, specified the configuration file in yaml, and reported startup errors
yaml:
logging:
level:
root: info
com.felix.flink.tutorial.api: debug
config: classpath:logback-spring.xml
maven:
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.felix</groupId>
<artifactId>flink-tutorial-component</artifactId>
<version>${revision}</version>
</dependency>
</dependencies>
exception:
23:45:33.009 [Thread-0] DEBUG org.springframework.boot.devtools.restart.classloader.RestartClassLoader - Created RestartClassLoader org.springframework.boot.devtools.restart.classloader.RestartClassLoader#7abaedae
Exception in thread "restartedMain" java.lang.reflect.InvocationTargetException
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.boot.devtools.restart.RestartLauncher.run(RestartLauncher.java:49)
Caused by: java.lang.NoClassDefFoundError: org/slf4j/impl/StaticLoggerBinder
at org.springframework.boot.logging.logback.LogbackLoggingSystem.getLoggerContext(LogbackLoggingSystem.java:293)
at org.springframework.boot.logging.logback.LogbackLoggingSystem.beforeInitialize(LogbackLoggingSystem.java:118)
at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationStartingEvent(LoggingApplicationListener.java:238)
at org.springframework.boot.context.logging.LoggingApplicationListener.onApplicationEvent(LoggingApplicationListener.java:220)
at org.springframework.context.event.SimpleApplicationEventMulticaster.doInvokeListener(SimpleApplicationEventMulticaster.java:176)
at org.springframework.context.event.SimpleApplicationEventMulticaster.invokeListener(SimpleApplicationEventMulticaster.java:169)
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:143)
at org.springframework.context.event.SimpleApplicationEventMulticaster.multicastEvent(SimpleApplicationEventMulticaster.java:131)
at org.springframework.boot.context.event.EventPublishingRunListener.starting(EventPublishingRunListener.java:79)
at org.springframework.boot.SpringApplicationRunListeners.lambda$starting$0(SpringApplicationRunListeners.java:56)
at java.base/java.util.ArrayList.forEach(ArrayList.java:1540)
at org.springframework.boot.SpringApplicationRunListeners.doWithListeners(SpringApplicationRunListeners.java:120)
at org.springframework.boot.SpringApplicationRunListeners.starting(SpringApplicationRunListeners.java:56)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:299)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1306)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1295)
at com.felix.flink.tutorial.api.FlinkTutorialApiApplication.main(FlinkTutorialApiApplication.java:15)
... 5 more
Caused by: java.lang.ClassNotFoundException: org.slf4j.impl.StaticLoggerBinder
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:583)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)
... 22 more
Process finished with exit code 0
i have import slf4j 2.0.3 derect in the pom,but it doesn't work
SLF4J drastically changed the way its implementations are found between versions 1.x and 2.x. In 1.x, the binding class needed to provide a class named org.slf4j.impl.StaticLoggerBinder - the class that's missing. In 2.x it uses the ServiceLoader mechanism.
Spring Boot currently still uses SLF4J 1.7.36, through spring-boot-starter-web -> spring-boot-starter -> spring-boot-starter-logging. The latter depends on some SLF4J bridges, as well as logback-classic which in turn depends on SLF4J 1.7.32. I think that the 1.7.36 "wins" over the 1.7.32.
Unless one of your other dependencies has a transitive dependency on SLF4J 2.x, everything should work just fine. If you do, then you have a mix of SLF4J 1.x and 2.x, and that's simply not going to work. Replace the 2.x dependency with a 1.x dependency and you should be fine (unless you use the fluent API that was added in 2.x).
env: HDP: 3.1.5(hadoop: 3.1.1, hive: 3.1.0), Flink: 1.12.2
Java code:
public static void main(String[] args) {
EnvironmentSettings settings = EnvironmentSettings.newInstance().useBlinkPlanner().build();
TableEnvironment tblEnv=TableEnvironment.create(settings);
String name = "myhive";
String defaultDatabase = "default";
String hiveConfDir = "/etc/hive/conf";
HiveCatalog hive = new HiveCatalog(name, defaultDatabase, hiveConfDir);
tblEnv.registerCatalog("myhive", hive);
tblEnv.useCatalog("myhive");
//tblEnv.getConfig().setSqlDialect(SqlDialect.HIVE);
tblEnv.sqlQuery("SELECT * FROM users").execute().print();
}
Dependency:
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-table-api-java-bridge_2.12</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-hive_2.12</artifactId>
<version>${flink.version}</version>
</dependency>
error 1:
org.apache.flink.util.FlinkException: JobMaster for job 35afe414e1dd861c86130ddd031312f2 failed.
at org.apache.flink.runtime.dispatcher.Dispatcher.jobMasterFailed(Dispatcher.java:887) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
at org.apache.flink.runtime.dispatcher.Dispatcher.dispatcherJobFailed(Dispatcher.java:465) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
at org.apache.flink.runtime.dispatcher.Dispatcher.handleDispatcherJobResult(Dispatcher.java:444) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
...
Caused by: org.apache.flink.runtime.client.JobInitializationException: Could not instantiate JobManager.
at org.apache.flink.runtime.dispatcher.Dispatcher.lambda$createJobManagerRunner$5(Dispatcher.java:494) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1604) ~[?:1.8.0_292]
...
Caused by: org.apache.flink.runtime.JobException: Cannot instantiate the coordinator for operator Source: HiveSource-zjdev_xiangliang.users -> SinkConversionToTuple2
at org.apache.flink.runtime.executiongraph.ExecutionJobVertex.<init>(ExecutionJobVertex.java:231) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
at org.apache.flink.runtime.executiongraph.ExecutionGraph.attachJobGraph(ExecutionGraph.java:866) ~[flink-dist_2.12-1.12.2.jar:1.12.2]
...
Caused by: java.lang.NoClassDefFoundError: Lorg/apache/hadoop/mapred/JobConf;
at java.lang.Class.getDeclaredFields0(Native Method) ~[?:1.8.0_292]
at java.lang.Class.privateGetDeclaredFields(Class.java:2583) ~[?:1.8.0_292]
at java.lang.Class.getDeclaredField(Class.java:2068) ~[?:1.8.0_292]
at java.io.ObjectStreamClass.getDeclaredSUID(ObjectStreamClass.java:1871) ~[?:1.8.0_292]
...
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf
at java.net.URLClassLoader.findClass(URLClassLoader.java:382) ~[?:1.8.0_292]
at java.lang.ClassLoader.loadClass(ClassLoader.java:418) ~[?:1.8.0_292]
...
try add dependency
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-mapreduce-client-core</artifactId>
<version>${hadoop.version}</version>
<scope>provided</scope>
</dependency>
get another error
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.commons.cli.Option.builder(Ljava/lang/String;)Lorg/apache/commons/cli/Option$Builder;
at org.apache.flink.runtime.entrypoint.parser.CommandLineOptions.<clinit>(CommandLineOptions.java:27)
at org.apache.flink.runtime.entrypoint.DynamicParametersConfigurationParserFactory.options(DynamicParametersConfigurationParserFactory.java:43)
at org.apache.flink.runtime.entrypoint.DynamicParametersConfigurationParserFactory.getOptions(DynamicParametersConfigurationParserFactory.java:50)
at org.apache.flink.runtime.entrypoint.parser.CommandLineParser.parse(CommandLineParser.java:42)
at org.apache.flink.runtime.entrypoint.ClusterEntrypointUtils.parseParametersOrExit(ClusterEntrypointUtils.java:63)
at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:89)
try to fix conflict about commons-cli:1.3.1 with 1.2:
choose 1.3.1 then error 1;
choose 1.2 then error 2;
add dependency commons-cli 1.4, then error 1.
1、commons-cli choose 1.3.1 or 1.4
2、add $hadoop_home/../hadoop_mapreduce/* to yarn.application.classpath
i got this error too , when i think i maybe a version conflict ,because i used
hive 3.1.2 version is to high than hadoop 2.7.6 version. i always got error such as guava version conflict and so on
so when i used flink-1.15.0 and link hive with jar flink-sql-connector-hive-3.1.2_2.12-1.15.0.jar i got Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapred.JobConf
resolve: this error it because not found jars please move your hadoop jars
hadoop-mapreduce-client-core.,hadoop-common,hadoop-mapreduce-client-common
,hadoop-mapreduce-client-jobclient and hive-exec-3.1.2.jar to flink lib path
I was referring to a post here:
Connecting to Zookeeper in a Apache Kafka Multi Node cluster
Its mentioned here that from kafka V9 version, Producer and Consumer does not have to use the zookeeper.connect property and just the bootstrap.servers is enough to producer/consume data.
My POM.xml looks like this in the consumer side:
<properties>
<java.version>1.7</java.version>
<kafka.version>0.9.0.1-cp1</kafka.version>
<kafka.scala.version>2.11</kafka.scala.version>
<confluent.version>2.0.1</confluent.version>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.10.0.0</version>
</dependency>
I run into the following issue in the consumer side, without zookeeper.connect property. Does anyone has the consumer part working without the zookeeper connect property ?
[WARNING]
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.codehaus.mojo.exec.ExecJavaMojo$1.run(ExecJavaMojo.java:297)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalArgumentException: requirement failed: Missing required property 'zookeeper.connect'
at scala.Predef$.require(Predef.scala:233)
at kafka.utils.VerifiableProperties.getString(VerifiableProperties.scala:177)
at kafka.utils.ZKConfig.<init>(ZkUtils.scala:902)
at kafka.consumer.ConsumerConfig.<init>(ConsumerConfig.scala:101)
at kafka.consumer.ConsumerConfig.<init>(ConsumerConfig.scala:105)
at io.confluent.examples.consumer.ConsumerGroup.<init>(ConsumerGroup.java:30)
at io.confluent.examples.consumer.ConsumerGroup.main(ConsumerGroup.java:113)
... 6 more
Only the new consumer works without connecting to Zookeeper and that is available in the kafka-clients artifact. You have to add the dependency:
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.0.0</version>
</dependency>
and use the implementation from the org.apache.kafka.clients.consumer. package.
Use Case
Simple message fetching and printing from Kafka topic using Spark with Java as programming language
Background
Experience in dealing with Kafka Storm Integration, developed and maintained kafka cluster and storm topologies more than a year.
No experience with Apache Spark and Scala
Simple word count application built and tested successfully using stand alone spark cluster.
Problem
Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:64)
at org.apache.spark.streaming.kafka.KafkaUtils$.createStream(KafkaUtils.scala:110)
at org.apache.spark.streaming.kafka.KafkaUtils.createStream(KafkaUtils.scala)
at com.random.spark.EventsToFileAggregator.main(EventsToFileAggregator.java:54)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
At EventsToFileAggregator.java:54
JavaPairReceiverInputDStream<String, String> messages =
KafkaUtils.createStream(jsc, args[0], args[1], topicMap,
StorageLevel.MEMORY_AND_DISK_SER());
pom.xml
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>1.6.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>1.6.1</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming-kafka_2.11</artifactId>
<version>1.6.1</version>
</dependency>
</dependencies>
Build
Successful without any warnings
Command
./bin/spark-submit --class com.random.spark.EventsToFileAggregator --master spark://host:7077 /usr/local/spark/stats/target/stats-1.0-SNAPSHOT-jar-with-dependencies.jar localhost:2181 test topic 2
NoSuchMethodError is almost always an indication that two libraries are not at a compatible version. In this case Spark-Streaming Kafka is attempting to use a Scala language feature that doesn't exist. Check that the version of Spark-Streaming Kafka is compatible with the version of Scala you're using. Make sure you're not actually running with Scala and not Java.
I'm getting the following errors :
Caused by: javax.persistence.PersistenceException: Failed to load provider from META-INF/services
at javax.persistence.spi.PersistenceProviderResolverHolder$DefaultPersistenceProviderResolver.getPersistenceProviders(PersistenceProviderResolverHolder.java:115)
at javax.persistence.Persistence$PersistenceUtilImpl.isLoaded(Persistence.java:278)
at org.hibernate.validator.engine.resolver.JPATraversableResolver.isReachable(JPATraversableResolver.java:62)
at org.hibernate.validator.engine.resolver.DefaultTraversableResolver.isReachable(DefaultTraversableResolver.java:94)
at org.hibernate.validator.engine.resolver.SingleThreadCachedTraversableResolver.isReachable(SingleThreadCachedTraversableResolver.java:47)
at org.hibernate.validator.engine.ValidatorImpl.isValidationRequired(ValidatorImpl.java:757)
... 96 more
Caused by: java.lang.ClassNotFoundException: me.prettyprint.hom.CassandraPersistenceProvider
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1858)
at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.java:1709)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:274)
at org.apache.geronimo.osgi.locator.ProviderLocator.loadClass(ProviderLocator.java:195)
at org.apache.geronimo.osgi.locator.ProviderLocator.locateServiceClasses(ProviderLocator.java:524)
at org.apache.geronimo.osgi.locator.ProviderLocator.getServices(ProviderLocator.java:315)
at javax.persistence.spi.PersistenceProviderResolverHolder$DefaultPersistenceProviderResolver.getPersistenceProviders(PersistenceProviderResolverHolder.java:108)
... 101 more
I have imported a pom dependeny in my project, the new pom dependeny inturn has some cassandra related dependeny shown below :
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.0.0</version>
</dependency>
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-mapping</artifactId>
<version>3.0.0</version>
</dependency>
The cassandra project works good in stand alone. Can someone help me with this
Your project is complaining ClassNotFoundException:me.prettyprint.hom.CassandraPersistenceProvider which belongs to Cassandra hector client.
I am guessing your project was using hector core which is no longer active hector client github page. You have to migrate all dependencies to datastax's cassandra drivers and remove all hector-client related dependencies. check it here