Kafka + Storm - satisfying dependencies - java

I'm attempting to deploy my first topology to a storm cluster as part of an assessment for my company. The topology is just to get values from kafka and put them into cassandra and redis.
After copying over scads of .jar files to try to satisfy the various dependencies I've run into a issue where storm claims a dependency is missing but the startup class list in the logs shows the class as available.
Here's the exception:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.(Pool.scala:28) ~[kafka_2.10-0.8.1.1.jar:na]
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.
~[kafka_2.10-0.8.1.1.jar:na]
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala) ~[kafka_2.10-0.8.1.1.jar:na]
at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39) ~[kafka_2.10-0.8.1.1.jar:na]
at kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:34) ~[kafka_2.10-0.8.1.1.jar:na]
at storm.kafka.DynamicPartitionConnections.register(DynamicPartitionConnections.java:60) ~[storm-kafka-0.9.4.jar:0.9.4]
at storm.kafka.PartitionManager.(PartitionManager.java:64) ~[storm-kafka-0.9.4.jar:0.9.4]
at storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:98) ~[storm-kafka-0.9.4.jar:0.9.4]
at storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:69) ~[storm-kafka-0.9.4.jar:0.9.4]
at storm.kafka.KafkaSpout.nextTuple(KafkaSpout.java:135) ~[storm-kafka-0.9.4.jar:0.9.4]
at backtype.storm.daemon.executor$fn__4654$fn__4669$fn__4698.invoke(executor.clj:565) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.util$async_loop$fn__458.invoke(util.clj:463) ~[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_45]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_45]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[na:1.8.0_45]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_45]
When I look at the startup info for the supervisor thread I see this:
2015-06-07T07:55:19.941-0700 o.a.z.ZooKeeper [INFO] Client environment:java.class.path= ... /usr/local/src/apache-storm-0.9.4/lib/scala-library-2.11.6.jar: ...
When I open this file I see this entry:
-rwxrwxrwx 0 0 0 0 Mar 18 2014 scala/collection/GenTraversableOnce.class
So something else is amiss. What step(s) have I missed here?
NOTE: similar issues from org/jboss/netty/channel/ChannelFactory..

The Kafka version specifies which Scala version it is built against.
Scala 2.10 - kafka_2.10-0.9.0.1.tgz (asc, md5)
Scala 2.11 - kafka_2.11-0.9.0.1.tgz (asc, md5)
I made the mistake of using Scala 2.10 with kafka 2.11.
I was able to resolve this by correcting my maven dependancies to the correct scala and kafka combination.

Release Notes Source download: kafka-0.11.0.1-src.tgz (asc, md5)
Binary downloads: Scala 2.11 - kafka_2.11-0.11.0.1.tgz (asc, md5)
Scala 2.12 - kafka_2.12-0.11.0.1.tgz (asc, md5) We build for multiple
This only matters if you are using Scala and you want a version built for the same Scala version you use. Otherwise any version should work (2.11 is recommended).

Related

How do I troubleshoot the installation of Apache Accumulo on Linux?

I am trying to install open source Accumulo on RHEL 7.x. I have two GB of swap space. I have installed Java 1.8, Hadoop 3, and Zookeeper. I have run the bootstrap_config.sh script for Accumulo 1.9.2.
I ran this (and expected it to work):
/bin/accumulo-1.9.2/bin/accumulo init
But I get this error:
[start.Main] ERROR: Uncaught exception
java.util.ServiceConfigurationError:
org.apache.accumulo.start.spi.KeywordExecutable: Provider
org.apache.accumulo.proxy.Proxy could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at org.apache.accumulo.start.Main.checkDuplicates(Main.java:237)
at org.apache.accumulo.start.Main.getExecutables(Main.java:228)
at org.apache.accumulo.start.Main.main(Main.java:84) Caused by: java.lang.NoClassDefFoundError:
org/apache/commons/configuration/Configuration
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2671)
at java.lang.Class.getConstructor0(Class.java:3075)
at java.lang.Class.newInstance(Class.java:412)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 5 more Caused by: java.lang.ClassNotFoundException: org.apache.commons.configuration.Configuration
at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at org.apache.accumulo.start.classloader.AccumuloClassLoader$2.loadClass(AccumuloClassLoader.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 10 more
I used the Accumulo bootstrap_config.sh script to configure Hadoop version 3. How do I get "/bin/accumulo-1.9.2/bin/accumulo init" to work?
Accumulo 1.9.2 expects Hadoop 2 out of the box, but does have a build profile to rebuild a tarball specifically for use with Hadoop 3. You can build Accumulo with the Hadoop 3 profile by downloading the source tarball and doing:
mvn clean package -Dhadoop.profile=3 -DskipTests
If you're not interested in rebuilding from source, it may be possible to simply fix the class path issues by reading the error message, and adjusting your class path accordingly. In this case, it seems you're missing a commons-configuration jar.

Kafka Structured Streaming KafkaSourceProvider could not be instantiated

I am working on a streaming project where I have a kafka stream of ping statistics like so :
64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=1 ttl=62 time=0.913 ms
64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=2 ttl=62 time=0.936 ms
64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=3 ttl=62 time=0.980 ms
64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=4 ttl=62 time=0.889 ms
I am trying to read this as a structured stream in pyspark. I start pyspark with the following command :
pyspark --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0
Pyspark version is 2.4, python version is 2.7 (tried with 3.6 as well)
And I get an error as soon as I send this piece of code (followed from Structured Streaming + Kafka Integration Guide):
df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "172.18.2.21:2181").option("subscribe", "ping-stats").load()
I run into the following error :
py4j.protocol.Py4JJavaError: An error occurred while calling o37.load.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.kafka010.KafkaSourceProvider could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
at org.apache.spark.sql.kafka010.KafkaSourceProvider.<init>(KafkaSourceProvider.scala:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 23 more
Can someone help me out with this?
I managed to solve this by ensuring that the spark-sql-kafka package's version matches the spark version.
In my case, I am now using --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.1 for my spark version 2.4.1, thereafter the .format("kafka") part of the code can be resolved.
Also, v2.12 of the package (i.e., org.apache.spark:spark-sql-kafka-0-10_2.12:2.4.1) does not seem stable at the time of writing, and using it will also cause the above error.
*EDIT: v2.12 spark-sql-kafka packages seem to only work with Spark built with Scala v2.12. Hence, for Spark v2.X versions (pre-built with Scala v2.11 by default), there's a need to instead use Spark binaries built with Scala v2.12 (e.g. spark-2.4.1-bin-without-hadoop-scala-2.12.tgz) if you really want to use spark-sql-kafka v2.12 package. For Spark v3.X, they are pre-built with Scala v2.12 by default, hence you'll only see/use v2.12 of the package.
Issue solved when used the Spark version 2.3.2 with Scala version 2.11.11 and dependency org.apache.spark:sl-kafka-0-10_2.10:2.0.2
#spark-shell -i Process.scala --master local[2] --packages org.apache.spark:sl-kafka-0-10_2.10:2.0.2 ...

Starting HBASE, java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder

I am trying to start HBASE with start-hbase.sh, however, I get the error: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder.
I have tried to add various .jar's to various folders (as suggested in other threads) but nothing works. I am using Hadoop 3.11 and HBase 2.10 Here is the (end of the) error log.
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster.
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2972)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2983)
Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:635)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:358)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:407)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:383)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:691)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:600)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:484)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2965)
... 5 more
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:190)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:499)
... 25 more
HBase 2.1.0 release uses HTrace, that is an incubating Apache Foundation project.
There is a folder for 3rd-party libraries in HBase lib folder, client-facing-thirdparty. You need to copy htrace-core-3.1.0-incubating.jar from there to the HBase lib directory. (see reference)
There is also another solution at Cloudera Community that changes a configuration instead of adding the library manually.
There seems to have been a compatibility issue between Hbase and Hadoop, I reverted to using Hadoop 2.9.1 and Hbase 1.2.6 together with JDK 1.8.0
HBase 2.1.4 has no htrace-core-3.1.0-incubating.jar in client-facing-thirdparty
You need copy from 2.1.3 or download from https://mvnrepository.com/artifact/org.apache.htrace/htrace-core

Play framework 1.4.2 throws ClassNotFoundException when precompile app with modules

We have an application which is built using playframework 1.4.2 and uses play-hazelcast module, java version is "1.8.0_91".
My application uses hazelcast distributed topic for message exchange between nodes. There are totally four nodes. When I start my application in normal mode with sources, everything works fine, but if I try ti run in precompiled mode it throws ClassNotFoundException, I can't understand where is the root of the problem in the play framework or hazelcast module?
The play-hazelcast plugin:
https://github.com/braman/hazelcast
dependencies.yml:
# Application dependencies
require:
- play
- hazelcast -> hazelcast latest.integration
- org.hibernate -> hibernate-search 4.3.0.Final
repositories:
- hazelcast:
type: local
artifact: ${application.path}/../hazelcast
contains:
- hazelcast -> *
Here is the exception:
WARNING: [10.0.0.167]:5799 [prod] [3.7.1] Error while logging processing event
com.hazelcast.nio.serialization.HazelcastSerializationException: java.lang.ClassNotFoundException: pojo.system.events.StatusChangedEvent
at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$JavaSerializer.read(JavaDefaultSerializers.java:224)
at com.hazelcast.internal.serialization.impl.StreamSerializerAdapter.read(StreamSerializerAdapter.java:46)
at com.hazelcast.internal.serialization.impl.AbstractSerializationService.toObject(AbstractSerializationService.java:172)
at com.hazelcast.topic.impl.DataAwareMessage.getMessageObject(DataAwareMessage.java:44)
at jobs.StartupJob$1.onMessage(StartupJob.java:72)
at com.hazelcast.topic.impl.TopicService.dispatchEvent(TopicService.java:134)
at com.hazelcast.spi.impl.eventservice.impl.EventProcessor.process(EventProcessor.java:48)
at com.hazelcast.spi.impl.eventservice.impl.RemoteEventProcessor.run(RemoteEventProcessor.java:36)
at com.hazelcast.util.executor.StripedExecutor$Worker.process(StripedExecutor.java:187)
at com.hazelcast.util.executor.StripedExecutor$Worker.run(StripedExecutor.java:171)
Caused by: java.lang.ClassNotFoundException: pojo.system.events.StatusChangedEvent
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at com.hazelcast.nio.ClassLoaderUtil.tryLoadClass(ClassLoaderUtil.java:151)
at com.hazelcast.nio.ClassLoaderUtil.loadClass(ClassLoaderUtil.java:120)
at com.hazelcast.nio.IOUtil$ClassLoaderAwareObjectInputStream.resolveClass(IOUtil.java:358)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at com.hazelcast.internal.serialization.impl.JavaDefaultSerializers$JavaSerializer.read(JavaDefaultSerializers.java:219)
... 9 more
This exception means that a message was received my hazelcast module, but it couldn't deserialize it because the class was not found.

java.lang.ClassNotFoundException in Apache Storm after migration to 1.0.0

I tried to migrate my simple Trident DRPC topology to Apache Storm 1.0.0 (from 0.10.0) and test it in a local in-memory mode. It's really simple topology with 1 bolt, so migration required only to replace backtype.storm and storm.trident to org.apache.storm and org.apache.storm.trident.
Unfortunately, I start getting errors like (with possible different missing classes for different topologies):
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.storm.trident.partition.IdentityGrouping
It's too weird to report issues on my specific topology, so I reproduced the issue with the storm-starter topology. Since, I'm interesting in testing on in-memory cluster I've chosen the ExclamationTopology.
> cd storm/examples/storm-starter
> mvn clean install -DskipTests=true
> mvn exec:java -Dstorm.topology=storm.starter.ExclamationTopology -DskipTests=true
The commands above started the local cluster, but after topology was submitted it also ended with a RuntimeException:
8632 [Thread-11] ERROR o.a.s.d.worker - Error on initialization of server mk-worker
java.lang.RuntimeException: java.lang.ClassNotFoundException: org.apache.storm.testing.TestWordSpout
at org.apache.storm.utils.Utils.javaDeserialize(Utils.java:181) ~[storm-core-1.0.0.jar:1.0.0]
at org.apache.storm.utils.Utils.getSetComponentObject(Utils.java:430) ~[storm-core-1.0.0.jar:1.0.0]
at org.apache.storm.daemon.task$get_task_object.invoke(task.clj:74) ~[storm-core-1.0.0.jar:1.0.0]
at org.apache.storm.daemon.task$mk_task_data$fn__7593.invoke(task.clj:177) ~[storm-core-1.0.0.jar:1.0.0]
at org.apache.storm.util$assoc_apply_self.invoke(util.clj:930) ~[storm-core-1.0.0.jar:1.0.0]
at org.apache.storm.daemon.task$mk_task_data.invoke(task.clj:170) ~[storm-core-1.0.0.jar:1.0.0]
at org.apache.storm.daemon.task$mk_task.invoke(task.clj:181) ~[storm-core-1.0.0.jar:1.0.0]
at org.apache.storm.daemon.executor$mk_executor$fn__7812.invoke(executor.clj:371) ~[storm-core-1.0.0.jar:1.0.0]
at clojure.core$map$fn__4553.invoke(core.clj:2622) ~[clojure-1.7.0.jar:?]
at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.7.0.jar:?]
....
Caused by: java.lang.ClassNotFoundException: org.apache.storm.testing.TestWordSpout
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[?:1.8.0_77]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[?:1.8.0_77]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[?:1.8.0_77]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[?:1.8.0_77]
at java.lang.Class.forName0(Native Method) ~[?:1.8.0_77]
at java.lang.Class.forName(Class.java:348) ~[?:1.8.0_77]
at org.apache.storm.shade.org.apache.commons.io.input.ClassLoaderObjectInputStream.resolveClass(ClassLoaderObjectInputStream.java:68) ~[storm-core-1.0.0.jar:1.0.0]
....
There are few similar issues already reported for older versions, but there was no clear solution or issue was too generic.
Of course, I've also checked the dependency tree and storm-core is included, so I think it's somehow related to serialization issues and the way I execute the topology in a local mode.
Any suggestions? I use Apache Maven 3.3.9 and Java 1.8.0_77 on Ubuntu 14.04. Here is my pom.xml

Categories

Resources