Kafka Structured Streaming KafkaSourceProvider could not be instantiated - java

I am working on a streaming project where I have a kafka stream of ping statistics like so :
64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=1 ttl=62 time=0.913 ms
64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=2 ttl=62 time=0.936 ms
64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=3 ttl=62 time=0.980 ms
64 bytes from vas.fractalanalytics.com (192.168.30.26): icmp_seq=4 ttl=62 time=0.889 ms
I am trying to read this as a structured stream in pyspark. I start pyspark with the following command :
pyspark --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.0
Pyspark version is 2.4, python version is 2.7 (tried with 3.6 as well)
And I get an error as soon as I send this piece of code (followed from Structured Streaming + Kafka Integration Guide):
df = spark.readStream.format("kafka").option("kafka.bootstrap.servers", "172.18.2.21:2181").option("subscribe", "ping-stats").load()
I run into the following error :
py4j.protocol.Py4JJavaError: An error occurred while calling o37.load.
: java.util.ServiceConfigurationError: org.apache.spark.sql.sources.DataSourceRegister: Provider org.apache.spark.sql.kafka010.KafkaSourceProvider could not be instantiated
at java.util.ServiceLoader.fail(ServiceLoader.java:232)
at java.util.ServiceLoader.access$100(ServiceLoader.java:185)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:384)
at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:404)
at java.util.ServiceLoader$1.next(ServiceLoader.java:480)
at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:43)
at scala.collection.Iterator$class.foreach(Iterator.scala:891)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1334)
at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
at scala.collection.AbstractTraversable.filter(Traversable.scala:104)
at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:630)
at org.apache.spark.sql.streaming.DataStreamReader.load(DataStreamReader.scala:161)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NoSuchMethodError: org.apache.spark.internal.Logging.$init$(Lorg/apache/spark/internal/Logging;)V
at org.apache.spark.sql.kafka010.KafkaSourceProvider.<init>(KafkaSourceProvider.scala:44)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:380)
... 23 more
Can someone help me out with this?

I managed to solve this by ensuring that the spark-sql-kafka package's version matches the spark version.
In my case, I am now using --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.1 for my spark version 2.4.1, thereafter the .format("kafka") part of the code can be resolved.
Also, v2.12 of the package (i.e., org.apache.spark:spark-sql-kafka-0-10_2.12:2.4.1) does not seem stable at the time of writing, and using it will also cause the above error.
*EDIT: v2.12 spark-sql-kafka packages seem to only work with Spark built with Scala v2.12. Hence, for Spark v2.X versions (pre-built with Scala v2.11 by default), there's a need to instead use Spark binaries built with Scala v2.12 (e.g. spark-2.4.1-bin-without-hadoop-scala-2.12.tgz) if you really want to use spark-sql-kafka v2.12 package. For Spark v3.X, they are pre-built with Scala v2.12 by default, hence you'll only see/use v2.12 of the package.

Issue solved when used the Spark version 2.3.2 with Scala version 2.11.11 and dependency org.apache.spark:sl-kafka-0-10_2.10:2.0.2
#spark-shell -i Process.scala --master local[2] --packages org.apache.spark:sl-kafka-0-10_2.10:2.0.2 ...

Related

Starting HBASE, java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder

I am trying to start HBASE with start-hbase.sh, however, I get the error: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder.
I have tried to add various .jar's to various folders (as suggested in other threads) but nothing works. I am using Hadoop 3.11 and HBase 2.10 Here is the (end of the) error log.
java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMaster.
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2972)
at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:236)
at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:140)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2983)
Caused by: java.lang.NoClassDefFoundError: org/apache/htrace/SamplerBuilder
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:635)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:619)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:149)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.hadoop.hbase.util.CommonFSUtils.getRootDir(CommonFSUtils.java:358)
at org.apache.hadoop.hbase.util.CommonFSUtils.isValidWALRootDir(CommonFSUtils.java:407)
at org.apache.hadoop.hbase.util.CommonFSUtils.getWALRootDir(CommonFSUtils.java:383)
at org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:691)
at org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:600)
at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:484)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:488)
at org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2965)
... 5 more
Caused by: java.lang.ClassNotFoundException: org.apache.htrace.SamplerBuilder
at java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:582)
at java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:190)
at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:499)
... 25 more
HBase 2.1.0 release uses HTrace, that is an incubating Apache Foundation project.
There is a folder for 3rd-party libraries in HBase lib folder, client-facing-thirdparty. You need to copy htrace-core-3.1.0-incubating.jar from there to the HBase lib directory. (see reference)
There is also another solution at Cloudera Community that changes a configuration instead of adding the library manually.
There seems to have been a compatibility issue between Hbase and Hadoop, I reverted to using Hadoop 2.9.1 and Hbase 1.2.6 together with JDK 1.8.0
HBase 2.1.4 has no htrace-core-3.1.0-incubating.jar in client-facing-thirdparty
You need copy from 2.1.3 or download from https://mvnrepository.com/artifact/org.apache.htrace/htrace-core

Cannot setup Apache Spark 2.1.1 on Windows 10

I have installed Apache Spark 2.1.1 on Windows 10, with Java 1.8 and Python version 3.6 Anaconda 4.3.1. I have also downloaded the winutils.exe and setup environment avriables for JAVA_HOME, HADOOP_HOME and SPARK_HOME as well as updated the path variable. I have also run winutils.exe chmod -R 777 \tmp\hive. But I am getting the below error when running pyspark in cmd prompt.
Please please can someone help, let me know if I missed out any important detail
Thanks in advance!
c:\Spark>bin\pyspark
Python 3.6.0 |Anaconda 4.3.1 (64-bit)| (default, Dec 23 2016, 11:57:41) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Traceback (most recent call last):
File "c:\Spark\python\pyspark\sql\utils.py", line 63, in deco
return f(*a, **kw)
File "c:\Spark\python\lib\py4j-0.10.4-src.zip\py4j\protocol.py", line 319, in get_return_value
py4j.protocol.Py4JJavaError: **An error occurred while calling o22.sessionState.
: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':**
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
I still get errors when launching [spark-shell], but it looks like Spark launches since I get the 'Welcome to Spark' piece. The error I get is
C:\Spark>bin\spark-shell
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/06/23 12:20:15 WARN General: Plugin (Bundle) "org.datanucleus.api.jdo" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/jars/datanucleus-api-jdo-3.2.6.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../jars/datanucleus-api-jdo-3.2.6.jar."
17/06/23 12:20:15 WARN General: Plugin (Bundle) "org.datanucleus.store.rdbms" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/jars/datanucleus-rdbms-3.2.9.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/bin/../jars/datanucleus-rdbms-3.2.9.jar."
17/06/23 12:20:15 WARN General: Plugin (Bundle) "org.datanucleus" is already registered. Ensure you dont have multiple JAR versions of the same plugin in the classpath. The URL "file:/C:/Spark/bin/../jars/datanucleus-core-3.2.10.jar" is already registered, and you are trying to register an identical plugin located at URL "file:/C:/Spark/jars/datanucleus-core-3.2.10.jar."
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveSessionState':
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:981)
at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:110)
at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:109)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:878)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:878)
at org.apache.spark.repl.Main$.createSparkSession(Main.scala:96)
... 47 elided
Caused by: java.lang.reflect.InvocationTargetException:
java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:978)
... 58 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.hive.HiveExternalCatalog':
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:169)
at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:86)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:101)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:101)
at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:100)
at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:157)
at org.apache.spark.sql.hive.HiveSessionState.<init>(HiveSessionState.scala:32)
... 63 more
Caused by: java.lang.reflect.InvocationTargetException: java.lang.reflect.InvocationTargetException: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.internal.SharedState$.org$apache$spark$sql$internal$SharedState$$reflect(SharedState.scala:166)
... 71 more
Caused by: java.lang.reflect.InvocationTargetException:
java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:264)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:358)
at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:262)
at org.apache.spark.sql.hive.HiveExternalCatalog.<init>(HiveExternalCatalog.scala:66)
... 76 more
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Ljava/lang/String;I)V
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode0(Native Method)
at org.apache.hadoop.io.nativeio.NativeIO$Windows.createDirectoryWithMode(NativeIO.java:524)
at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:478)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:532)
at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:509)
at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:305)
at org.apache.hadoop.hive.ql.session.SessionState.createPath(SessionState.java:639)
at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:561)
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:508)
at org.apache.spark.sql.hive.client.HiveClientImpl.<init>(HiveClientImpl.scala:188)
... 84 more
14: error: not found: value spark
import spark.implicits._
^
14: error: not found: value spark
import spark.sql
^
Welcome to
Setup that worked for me is as follows "i didn't use winutils.exe":-
install pyspark and findspark using "Anaconda Command Prompt" as
pip3 install pyspark
and
pip3 install findspark
as you have already downloaded spark setup. unzip it and keep it in "C" drive i.e. "C:\spark-2.2.0-bin-hadoop2.7" and create new environment variable "SPARK_HOME" and set it to "C:\spark-2.2.0-bin-hadoop2.7\bin" and open "path" variable in system variables and add the same there as well.
now open your command prompt and come from "C:\User*" to "C:\" by doing cd.. twice and run following command
set SPARK_HOME='spark-2.2.0-bin-hadoop2.7'
and you are good to go.
Now you just have to import the file location before importing pyspark in your jupyter notebook. use following code:-
import findspark
findspark.init('C:\spark-2.2.0-bin-hadoop2.7')
import pyspark

Kafka + Storm - satisfying dependencies

I'm attempting to deploy my first topology to a storm cluster as part of an assessment for my company. The topology is just to get values from kafka and put them into cassandra and redis.
After copying over scads of .jar files to try to satisfy the various dependencies I've run into a issue where storm claims a dependency is missing but the startup class list in the logs shows the class as available.
Here's the exception:
java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at kafka.utils.Pool.(Pool.scala:28) ~[kafka_2.10-0.8.1.1.jar:na]
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.
~[kafka_2.10-0.8.1.1.jar:na]
at kafka.consumer.FetchRequestAndResponseStatsRegistry$.(FetchRequestAndResponseStats.scala) ~[kafka_2.10-0.8.1.1.jar:na]
at kafka.consumer.SimpleConsumer.(SimpleConsumer.scala:39) ~[kafka_2.10-0.8.1.1.jar:na]
at kafka.javaapi.consumer.SimpleConsumer.(SimpleConsumer.scala:34) ~[kafka_2.10-0.8.1.1.jar:na]
at storm.kafka.DynamicPartitionConnections.register(DynamicPartitionConnections.java:60) ~[storm-kafka-0.9.4.jar:0.9.4]
at storm.kafka.PartitionManager.(PartitionManager.java:64) ~[storm-kafka-0.9.4.jar:0.9.4]
at storm.kafka.ZkCoordinator.refresh(ZkCoordinator.java:98) ~[storm-kafka-0.9.4.jar:0.9.4]
at storm.kafka.ZkCoordinator.getMyManagedPartitions(ZkCoordinator.java:69) ~[storm-kafka-0.9.4.jar:0.9.4]
at storm.kafka.KafkaSpout.nextTuple(KafkaSpout.java:135) ~[storm-kafka-0.9.4.jar:0.9.4]
at backtype.storm.daemon.executor$fn__4654$fn__4669$fn__4698.invoke(executor.clj:565) ~[storm-core-0.9.4.jar:0.9.4]
at backtype.storm.util$async_loop$fn__458.invoke(util.clj:463) ~[storm-core-0.9.4.jar:0.9.4]
at clojure.lang.AFn.run(AFn.java:24) [clojure-1.5.1.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: java.lang.ClassNotFoundException: scala.collection.GenTraversableOnce$class
at java.net.URLClassLoader.findClass(URLClassLoader.java:381) ~[na:1.8.0_45]
at java.lang.ClassLoader.loadClass(ClassLoader.java:424) ~[na:1.8.0_45]
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331) ~[na:1.8.0_45]
at java.lang.ClassLoader.loadClass(ClassLoader.java:357) ~[na:1.8.0_45]
When I look at the startup info for the supervisor thread I see this:
2015-06-07T07:55:19.941-0700 o.a.z.ZooKeeper [INFO] Client environment:java.class.path= ... /usr/local/src/apache-storm-0.9.4/lib/scala-library-2.11.6.jar: ...
When I open this file I see this entry:
-rwxrwxrwx 0 0 0 0 Mar 18 2014 scala/collection/GenTraversableOnce.class
So something else is amiss. What step(s) have I missed here?
NOTE: similar issues from org/jboss/netty/channel/ChannelFactory..
The Kafka version specifies which Scala version it is built against.
Scala 2.10 - kafka_2.10-0.9.0.1.tgz (asc, md5)
Scala 2.11 - kafka_2.11-0.9.0.1.tgz (asc, md5)
I made the mistake of using Scala 2.10 with kafka 2.11.
I was able to resolve this by correcting my maven dependancies to the correct scala and kafka combination.
Release Notes Source download: kafka-0.11.0.1-src.tgz (asc, md5)
Binary downloads: Scala 2.11 - kafka_2.11-0.11.0.1.tgz (asc, md5)
Scala 2.12 - kafka_2.12-0.11.0.1.tgz (asc, md5) We build for multiple
This only matters if you are using Scala and you want a version built for the same Scala version you use. Otherwise any version should work (2.11 is recommended).

Google Cloud Bigtable and Java 8

I'm connecting to Bigtable from local machine, following this example.
When I'm running my project using JDK 7 everything works perfect, but when running it with JDK 8 I have following:
Exception in thread "main" java.io.IOException: Failed to create table 'testTable'
at org.apache.hadoop.hbase.client.BigtableAdmin.createTable(BigtableAdmin.java:306)
at com.zoomdata.thrift.provider.BigTableTestConnector.testConnection(BigTableTestConnector.java:62)
at com.zoomdata.thrift.provider.BigTableTestConnector.main(BigTableTestConnector.java:91)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:134)
Caused by: java.lang.NoSuchMethodError: sun.security.ssl.ClientHandshaker.getRawHostnameSE()Ljava/lang/String;
at sun.security.ssl.ClientHandshaker.getKickstartMessage(ClientHandshaker.java:1294)
at sun.security.ssl.Handshaker.kickstart(Handshaker.java:1014)
at sun.security.ssl.SSLSocketImpl.kickstartHandshake(SSLSocketImpl.java:1475)
at sun.security.ssl.SSLSocketImpl.performInitialHandshake(SSLSocketImpl.java:1339)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1391)
at sun.security.ssl.SSLSocketImpl.startHandshake(SSLSocketImpl.java:1375)
at sun.net.www.protocol.https.HttpsClient.afterConnect(HttpsClient.java:563)
at sun.net.www.protocol.https.AbstractDelegateHttpsURLConnection.connect(AbstractDelegateHttpsURLConnection.java:185)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1282)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1257)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getOutputStream(HttpsURLConnectionImpl.java:250)
at com.google.bigtable.repackaged.com.google.api.client.http.javanet.NetHttpRequest.execute(NetHttpRequest.java:77)
at com.google.bigtable.repackaged.com.google.api.client.http.HttpRequest.execute(HttpRequest.java:965)
at com.google.auth.oauth2.ServiceAccountCredentials.refreshAccessToken(ServiceAccountCredentials.java:222)
at com.google.auth.oauth2.OAuth2Credentials.refresh(OAuth2Credentials.java:76)
at com.google.auth.oauth2.OAuth2Credentials.getRequestMetadata(OAuth2Credentials.java:53)
at io.grpc.auth.ClientAuthInterceptor$1.checkedStart(ClientAuthInterceptor.java:82)
at io.grpc.ClientInterceptors$CheckedForwardingCall.start(ClientInterceptors.java:185)
at io.grpc.stub.Calls.asyncServerStreamingCall(Calls.java:174)
at io.grpc.stub.Calls.unaryFutureCall(Calls.java:86)
at io.grpc.stub.Calls.blockingUnaryCall(Calls.java:129)
at com.google.bigtable.admin.table.v1.BigtableTableServiceGrpc$BigtableTableServiceBlockingStub.createTable(BigtableTableServiceGrpc.java:332)
at com.google.cloud.bigtable.grpc.BigtableAdminGrpcClient.createTable(BigtableAdminGrpcClient.java:78)
at org.apache.hadoop.hbase.client.BigtableAdmin.createTable(BigtableAdmin.java:304)
... 7 more
Seems there is changes in sun libraries in java 8. Is there a workaround to run this code under java 8?
HBase hasn't yet been certified to run with Java 8. For the time being, stick with Java 7. Here are two HBase / Hadoop tracking bugs for this:
https://issues.apache.org/jira/browse/HADOOP-11090
https://issues.apache.org/jira/browse/HBASE-7608
Stumbled upon this while looking for the answer for it myself.
I was able to get it to work with using right version of ALPN in my bootclasspath.
For Java 8 I used ALPN version 8.1.4.v20150727.

Troubles running Mahout 0.9 text-processing examples

TL/DR : are mahout 0.9 examples compatible with hadoop 2.4 ?
My problem:
I would like to classify a bunch of documents using Mahout 0.9. To do so, I'm following the example described here.
I'm on windows and trying to go full native (ie, no cygwin). I already dispose of a local hadoop 2.4.1 cluster.
I downloaded the mahout sources and compiled it according to the wiki :
mvn "-Dhadoop2.version=2.4.1" -DskipTests clean install
I then tried to execute the example with the following example :
hadoop jar $Env:mahout_home/examples/target/mahout-examples-0.9-job.jar org.apache.mahout.driver.MahoutDriver seqdirectory -i Decomposition -o output
It all seems to work : I'm getting logs showing the mapreduce job begins to run. However, I quickly get the following errors :
Error: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.ja
va:166)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.<init>(CombineFileRecordReader.java:126)
at org.apache.mahout.text.MultipleTextFileInputFormat.createRecordReader(MultipleTextFileInputFormat.java:43)
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.<init>(MapTask.java:492)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:735)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:167)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:162)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.mapreduce.lib.input.CombineFileRecordReader.initNextRecordReader(CombineFileRecordReader.ja
va:157)
... 10 more
Caused by: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.TaskAttemptContext, but c
lass was expected
at org.apache.mahout.text.WholeFileRecordReader.<init>(WholeFileRecordReader.java:59)
... 15 more
According to the various links I found, it seems to come from code intented for Hadoop 1.0.
Am I missing something, or are mahout provided examples not suited for a Hadoop 2.4 cluster ?
The problem is because the object TaskAttemptContext is a Interface in the version 2.4 of Hadoop and the job was expecting a class (version 1.1.2 Hadoop). TaskAttemptContext has been changed in version 2.0.

Categories

Resources