Spark on HBase exception class not found (JAVA) - java

Im trying to communicate with hbase using spark. I´m using this code below:
SparkConf sparkConf = new SparkConf().setAppName("HBaseRead");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
Configuration conf = HBaseConfiguration.create();
conf.addResource(new Path("/etc/hbase/conf/core-site.xml"));
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);
Scan scan = new Scan();
scan.setCaching(100);
JavaRDD<Tuple2<ImmutableBytesWritable, Result>> hbaseRdd = hbaseContext.hbaseRDD(TableName.valueOf("climate"), scan);
System.out.println("Number of Records found : " + hbaseRdd.count());
If I execute this, I get the following error:
Exception in thread "dag-scheduler-event-loop" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/StoreFileWriter
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethod(Class.java:2128)
at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
...
I did not find any solution via google. Has anyone an idea?
--------edit--------
I´m using maven. My Pom looks like:
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.sharegov</groupId>
<artifactId>mjson</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.10</artifactId>
<version>0.3.5</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-spark</artifactId>
<version>2.0.0-SNAPSHOT</version>
</dependency>
</dependencies>
Im building my application with dependencies using the maven-assembly-plugin

You are getting the NoClassDefFoundError, because spark is not able to find hbase jars in the classpath, you need to supply the required jars to spark-submit explicitly using --jars parameter while launching job:
${SPARK_HOME}/bin/spark-submit \
--jars ${..add hbase jars comma separated...}
--class ....
.........

Related

Spark- Exception in thread java.lang.NoSuchMethodError

I am trying to score a model from pmml file using pmml4s library. Every time I submit the job in Spark I get the following error:
20/05/13 23:30:10 ERROR SparkSubmit: org.apache.spark.sql.types.StructType.names().
[Ljava/lang/String;
java.lang.NoSuchMethodError: org.apache.spark.sql.types.StructType.names().
[Ljava/lang/String;
at org.pmml4s.spark.ScoreModel.transform(ScoreModel.scala:56)
at com.aexp.JavaPMML.main(JavaPMML.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
Following is my code sample:
ScoreModel model = ScoreModel.fromFile(args[0]);
SparkConf conf = new SparkConf();
SparkSession spark = SparkSession.builder().config(conf).getOrCreate();
Dataset<?> df = spark.read().format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load(args[1]);
Dataset<?> scoreDf = model.transform(df);
Following is the pom file that I am using:
<dependencies>
<dependency>
<groupId>org.pmml4s</groupId>
<artifactId>pmml4s-spark_2.11</artifactId>
<version>0.9.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.3.2</version>
</dependency>
</dependencies>
I have edited my pom file and made the spark version similar still I face the same issue. When I am using Scala, I am facing the same problem. Is there any dependency that I am missing?
Try to use same version of spark libraries. If spark versions are not matching we will be getting NoSuchMethodError issue in many places as those methods might have modified or removed in latest versions.
The error is caused by the PMML4S-Spark used the method names of StructType, which is introduced since Spark 2.4. Now it has been fixed in the latest PMML4S-Spark 0.9.5. Please, update your pom file to use the new version:
<dependency>
<groupId>org.pmml4s</groupId>
<artifactId>pmml4s-spark_2.11</artifactId>
<version>0.9.5</version>
</dependency>

Flink Application throw Class Not Found Exception in Java

I have a Flink Cluster with Yarn, use the flink-quickstart-java Archetype to build a demo project. After building a fat-jar with 'mvn clean package -Pbuild-jar' command, and submit the program with 'flink run -m yarn-cluster -yn 2 ./flink-SNAPSHOT-1.0.jar', the program throw the following exception:
java.lang.NoClassDefFoundError:
java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09.setDeserializer(FlinkKafkaConsumer09.java:290)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09.(FlinkKafkaConsumer09.java:216)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09.(FlinkKafkaConsumer09.java:154)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.(FlinkKafkaConsumer010.java:128)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.(FlinkKafkaConsumer010.java:112)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.(FlinkKafkaConsumer010.java:79)
at stream.TransferKafka.main(TransferKafka.java:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:828)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:283)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1080)
at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1127)
at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1124)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1124)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 24 more
And Here is my demo:
public static void main(String[] args) {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties props = new Properties();
props.setProperty("bootstrap.servers", "ip:port");
props.setProperty("group.id", "NewFlinkTest");
DataStreamSource < String > stream = env.addSource(new FlinkKafkaConsumer010 < > ("kafka_test", new SimpleStringSchema(), props));
stream.addSink(new FlinkKafkaProducer010 < > ("kafka_test_out", new SimpleStringSchema(), props));
try {
env.execute("Flink Jar Test");
} catch (Exception e) {
e.printStackTrace();
}
}
And some version information:
FLink Version: 1.4.0
Hadoop Version: 2.7.2
Kafka Version: 0.10.2.1
JDK Version: 1.8
Pom dependencies
Edit1:
<?xml version="1.0" encoding="UTF-8"?>
<dependencies>
<!-- Apache Flink dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<!-- This dependency is required to actually execute jobs. It is currently pulled in by flink-streaming-java, but we explicitly depend on it to safeguard against future changes. -->
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!-- explicitly add a standard logging framework, as Flink does not have a hard dependency on one specific framework by default -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-rabbitmq_2.11</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
<version>1.4.0</version>
</dependency>
</dependencies>
After some attempts, I find the code throws exception is not the same jar that I packed into my uber-jar. I think the main reason is the client server has older version of the flink-connector-kafka jar, but no matter how I set the config yaml property 'yarn.per-job-cluster.include-user-jar', the program always throws the same exception.
Edit2:
After add kafka-clients:0.10.2.1 to flink_home/lib/, it works. But still don't know the reason why it doesn't read class file in uber jar.
First, you may verify if the missing class is in your jar file by grep 'ByteArrayDeserializer' ./flink-SNAPSHOT-1.0.jar.
You probably want to add <scope>provided</scope> to flink-streaming-scala, flink-clients, link-table-api-scala-bridge and flink-table-planner-blink - that solves my problem

Unable to connect to HBase table using Java

I am trying to connect to Hbase (version: 1.2.0) deployed on a different machine. I am using Eclipse and below is the code that I have written:
HTable table = null;
Configuration configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.quorum", "192.168.0.191");
configuration.set("hbase.zookeeper.property.clientPort", "2181");
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
table = new HTable(configuration, hbaseTable);
Below are the dependencies I have added in pom.xml:
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.0</version>
<scope>provided</scope>
</dependency>
On running the code, I am getting the below exception:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addDeprecations([Lorg/apache/hadoop/conf/Configuration$DeprecationDelta;)V
at org.apache.hadoop.mapreduce.util.ConfigUtil.addDeprecatedKeys(ConfigUtil.java:54)
at org.apache.hadoop.mapreduce.util.ConfigUtil.loadResources(ConfigUtil.java:42)
at org.apache.hadoop.mapred.JobConf.<clinit>(JobConf.java:119)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:80)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.security.Groups.<init>(Groups.java:48)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:140)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:134)
at org.apache.hadoop.hbase.security.UserProvider.<clinit>(UserProvider.java:56)
at org.apache.hadoop.hbase.client.HConnectionKey.<init>(HConnectionKey.java:71)
at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:298)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:184)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:150)
at HbaseConnection.createConnection(HbaseConnection.java:34)
at HbaseConnection.main(HbaseConnection.java:22)
I have gone through various links related to similar problem but I could not find any working solution. Can somebody help with the issue in my configuration or in my code?

Getting Spark Logging class not found when using Spark SQL

I am trying to do a simple Spark SQL programming in Java. In the program, I am getting data from a Cassandra table, converting the RDD into a Dataset and displaying the data. When I run the spark-submit command, I am getting the error: java.lang.ClassNotFoundException: org.apache.spark.internal.Logging.
My program is:
SparkConf sparkConf = new SparkConf().setAppName("DataFrameTest")
.set("spark.cassandra.connection.host", "abc")
.set("spark.cassandra.auth.username", "def")
.set("spark.cassandra.auth.password", "ghi");
SparkContext sparkContext = new SparkContext(sparkConf);
JavaRDD<EventLog> logsRDD = javaFunctions(sparkContext).cassandraTable("test", "log",
mapRowTo(Log.class));
SparkSession sparkSession = SparkSession.builder().appName("Java Spark SQL").getOrCreate();
Dataset<Row> logsDF = sparkSession.createDataFrame(logsRDD, Log.class);
logsDF.show();
My POM dependencies are:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>1.6.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.2</version>
</dependency>
</dependencies>
My spark-submit command is: /home/ubuntu/spark-2.0.2-bin-hadoop2.7/bin/spark-submit --class "com.jtv.spark.dataframes.App" --master local[4] spark.dataframes-0.1-jar-with-dependencies.jar
How do I solve this error? Downgrading to 1.5.2 does not work as 1.5.2 does not have org.apache.spark.sql.Dataset and org.apache.spark.sql.SparkSession.
This may be a problem into your IDE. As some of this packages are created and Scala the Java project, sometimes the IDE is unable to understand what is going on. I am using the Intellij and it keeps displaying this message to me. But, when I try to run the "mvn test" or "mvn package" everything is fine. Please check if this is really some package error or just the IDE that is lost.
Spark Logging is available for Spark version 1.5.2 and lower but not higher version. So your dependency in pom.xml should be like this:
<dependencies>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.10</artifactId>
<version>1.5.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.5.2</version>
</dependency>
</dependencies>
Please let me know if it works or not.
The below dependency worked fine for my case.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.2.0</version>
<scope>provided</scope>
</dependency>
Pretty late to the party here, but I added
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
<scope>provided</scope>
</dependency>
To solve this issue. Seems to work for my case.
Make sure you have the correct spark version in the pom.xml.
previously, in local, I have a different version of Spark and that is why I was getting the error in IntelliJ IDE. "Can not have access Spark.logging class"
In my case, Changed it from 2.4.2 -> 2.4.3, and it solved.
Spark version & Scala version info, we can get from spark-shell command.
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.4.3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.4.3</version>
</dependency>

Remove debug log from Console using Spark 1.6.2

Using Java, I start a basic Spark app using:
SparkConf conf = new SparkConf().setAppName("myApp").setMaster("local");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
javaSparkContext.setLogLevel("INFO");
SQLContext sqlContext = new SQLContext(javaSparkContext);
I try to have the system a little less verbosy by adding the setLogLevel, but it does not take it. I still have a lot of Debug information.
Ideally, I would like to shut off all org.apache.spark.* except errors...
Update #1:
Here is my pom.xml:
<dependencies>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.6</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-core</artifactId>
<version>5.2.0.Final</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.6.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.4.0</version>
</dependency>
</dependencies>
There is a file conf/log4j.properties.template, copy it and modify according to your need for logging.
cd spark/conf
cp log4j.properties.template log4j.properties
add rows to log4j.properties should work
log4j.logger.org.apache.spark=ERROR
[Edit]
If it is a maven java project, running a standalone spark. Copy the log4j.properties to src/main/resources, or to src/test/resources if it's for test cases. And modify accordingly.

Categories

Resources