I am trying to score a model from pmml file using pmml4s library. Every time I submit the job in Spark I get the following error:
20/05/13 23:30:10 ERROR SparkSubmit: org.apache.spark.sql.types.StructType.names().
[Ljava/lang/String;
java.lang.NoSuchMethodError: org.apache.spark.sql.types.StructType.names().
[Ljava/lang/String;
at org.pmml4s.spark.ScoreModel.transform(ScoreModel.scala:56)
at com.aexp.JavaPMML.main(JavaPMML.java:24)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
Following is my code sample:
ScoreModel model = ScoreModel.fromFile(args[0]);
SparkConf conf = new SparkConf();
SparkSession spark = SparkSession.builder().config(conf).getOrCreate();
Dataset<?> df = spark.read().format("csv")
.option("header", "true")
.option("inferSchema", "true")
.load(args[1]);
Dataset<?> scoreDf = model.transform(df);
Following is the pom file that I am using:
<dependencies>
<dependency>
<groupId>org.pmml4s</groupId>
<artifactId>pmml4s-spark_2.11</artifactId>
<version>0.9.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.3.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.3.2</version>
</dependency>
</dependencies>
I have edited my pom file and made the spark version similar still I face the same issue. When I am using Scala, I am facing the same problem. Is there any dependency that I am missing?
Try to use same version of spark libraries. If spark versions are not matching we will be getting NoSuchMethodError issue in many places as those methods might have modified or removed in latest versions.
The error is caused by the PMML4S-Spark used the method names of StructType, which is introduced since Spark 2.4. Now it has been fixed in the latest PMML4S-Spark 0.9.5. Please, update your pom file to use the new version:
<dependency>
<groupId>org.pmml4s</groupId>
<artifactId>pmml4s-spark_2.11</artifactId>
<version>0.9.5</version>
</dependency>
Related
I run the following commands:
1) sls create --template aws-java-maven
2) mvn clean install
3) sls invoke local -f hello
I got this error:
Serverless: In order to get human-readable output, please implement "toString()" method of your "ApiGatewayResponse" object.
java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.serverless.InvokeBridge.invoke(InvokeBridge.java:95)
at com.serverless.InvokeBridge.<init>(InvokeBridge.java:39)
at com.serverless.InvokeBridge.main(InvokeBridge.java:150)
Caused by: java.lang.NoSuchMethodError: com.amazonaws.services.lambda.runtime.LambdaLogger.log([B)V
at com.amazonaws.services.lambda.runtime.log4j2.LambdaAppender.append(LambdaAppender.java:74)
at org.apache.logging.log4j.core.config.AppenderControl.tryCallAppender(AppenderControl.java:156)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender0(AppenderControl.java:129)
at org.apache.logging.log4j.core.config.AppenderControl.callAppenderPreventRecursion(AppenderControl.java:120)
at org.apache.logging.log4j.core.config.AppenderControl.callAppender(AppenderControl.java:84)
at org.apache.logging.log4j.core.config.LoggerConfig.callAppenders(LoggerConfig.java:448)
at org.apache.logging.log4j.core.config.LoggerConfig.processLogEvent(LoggerConfig.java:433)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:417)
at org.apache.logging.log4j.core.config.LoggerConfig.log(LoggerConfig.java:403)
at org.apache.logging.log4j.core.config.AwaitCompletionReliabilityStrategy.log(AwaitCompletionReliabilityStrategy.java:63)
at org.apache.logging.log4j.core.Logger.logMessage(Logger.java:146)
at org.apache.logging.log4j.spi.AbstractLogger.logMessageSafely(AbstractLogger.java:2091)
at org.apache.logging.log4j.spi.AbstractLogger.logMessage(AbstractLogger.java:2005)
at org.apache.logging.log4j.spi.AbstractLogger.logIfEnabled(AbstractLogger.java:1876)
at org.apache.logging.log4j.spi.AbstractLogger.info(AbstractLogger.java:1421)
at com.serverless.Handler.handleRequest(Handler.java:18)
... 7 more
Adding both log4j and log4j2 with version 1.0.0 (decreased from 1.1.0) helped ...
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-lambda-java-log4j</artifactId>
<version>1.0.0</version>
</dependency>
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-lambda-java-log4j2</artifactId>
<version>1.0.0</version>
</dependency>
It seems you are missing some runtime logger dependencies. Try adding these to your pom file
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-lambda-java-log4j2</artifactId>
<version>1.1.0</version>
</dependency>
Also, you need to make sure that you are using the compatible version of Java AWS SDK, which is generally the same version number.Something like this
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-lambda-java-log4j</artifactId>
<version>1.0.0</version>
</dependency>
This also work for me....
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-lambda-java-log4j2</artifactId>
<version>1.1.0</version>
</dependency>
In java file add below lines.
LambdaLogger logger = context.getLogger();
logger.log("received: " +input.toString());
I have a Flink Cluster with Yarn, use the flink-quickstart-java Archetype to build a demo project. After building a fat-jar with 'mvn clean package -Pbuild-jar' command, and submit the program with 'flink run -m yarn-cluster -yn 2 ./flink-SNAPSHOT-1.0.jar', the program throw the following exception:
java.lang.NoClassDefFoundError:
java.lang.NoClassDefFoundError: org/apache/kafka/common/serialization/ByteArrayDeserializer
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09.setDeserializer(FlinkKafkaConsumer09.java:290)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09.(FlinkKafkaConsumer09.java:216)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer09.(FlinkKafkaConsumer09.java:154)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.(FlinkKafkaConsumer010.java:128)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.(FlinkKafkaConsumer010.java:112)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer010.(FlinkKafkaConsumer010.java:79)
at stream.TransferKafka.main(TransferKafka.java:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:525)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:417)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:395)
at org.apache.flink.client.CliFrontend.executeProgram(CliFrontend.java:828)
at org.apache.flink.client.CliFrontend.run(CliFrontend.java:283)
at org.apache.flink.client.CliFrontend.parseParameters(CliFrontend.java:1080)
at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1127)
at org.apache.flink.client.CliFrontend$1.call(CliFrontend.java:1124)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1781)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.CliFrontend.main(CliFrontend.java:1124)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 24 more
And Here is my demo:
public static void main(String[] args) {
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
Properties props = new Properties();
props.setProperty("bootstrap.servers", "ip:port");
props.setProperty("group.id", "NewFlinkTest");
DataStreamSource < String > stream = env.addSource(new FlinkKafkaConsumer010 < > ("kafka_test", new SimpleStringSchema(), props));
stream.addSink(new FlinkKafkaProducer010 < > ("kafka_test_out", new SimpleStringSchema(), props));
try {
env.execute("Flink Jar Test");
} catch (Exception e) {
e.printStackTrace();
}
}
And some version information:
FLink Version: 1.4.0
Hadoop Version: 2.7.2
Kafka Version: 0.10.2.1
JDK Version: 1.8
Pom dependencies
Edit1:
<?xml version="1.0" encoding="UTF-8"?>
<dependencies>
<!-- Apache Flink dependencies -->
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-core</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-java</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<!-- This dependency is required to actually execute jobs. It is currently pulled in by flink-streaming-java, but we explicitly depend on it to safeguard against future changes. -->
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
<version>${flink.version}</version>
</dependency>
<!-- explicitly add a standard logging framework, as Flink does not have a hard dependency on one specific framework by default -->
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>${slf4j.version}</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>${log4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-rabbitmq_2.11</artifactId>
<version>1.4.0</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-connector-kafka-0.10_${scala.binary.version}</artifactId>
<version>1.4.0</version>
</dependency>
</dependencies>
After some attempts, I find the code throws exception is not the same jar that I packed into my uber-jar. I think the main reason is the client server has older version of the flink-connector-kafka jar, but no matter how I set the config yaml property 'yarn.per-job-cluster.include-user-jar', the program always throws the same exception.
Edit2:
After add kafka-clients:0.10.2.1 to flink_home/lib/, it works. But still don't know the reason why it doesn't read class file in uber jar.
First, you may verify if the missing class is in your jar file by grep 'ByteArrayDeserializer' ./flink-SNAPSHOT-1.0.jar.
You probably want to add <scope>provided</scope> to flink-streaming-scala, flink-clients, link-table-api-scala-bridge and flink-table-planner-blink - that solves my problem
I am trying to connect to Hbase (version: 1.2.0) deployed on a different machine. I am using Eclipse and below is the code that I have written:
HTable table = null;
Configuration configuration = HBaseConfiguration.create();
configuration.set("hbase.zookeeper.quorum", "192.168.0.191");
configuration.set("hbase.zookeeper.property.clientPort", "2181");
FilterList list = new FilterList(FilterList.Operator.MUST_PASS_ALL);
table = new HTable(configuration, hbaseTable);
Below are the dependencies I have added in pom.xml:
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-client</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<version>1.2.0</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>3.1.0</version>
<scope>provided</scope>
</dependency>
On running the code, I am getting the below exception:
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.addDeprecations([Lorg/apache/hadoop/conf/Configuration$DeprecationDelta;)V
at org.apache.hadoop.mapreduce.util.ConfigUtil.addDeprecatedKeys(ConfigUtil.java:54)
at org.apache.hadoop.mapreduce.util.ConfigUtil.loadResources(ConfigUtil.java:42)
at org.apache.hadoop.mapred.JobConf.<clinit>(JobConf.java:119)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:810)
at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:80)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.security.Groups.<init>(Groups.java:48)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:140)
at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:134)
at org.apache.hadoop.hbase.security.UserProvider.<clinit>(UserProvider.java:56)
at org.apache.hadoop.hbase.client.HConnectionKey.<init>(HConnectionKey.java:71)
at org.apache.hadoop.hbase.client.ConnectionManager.getConnectionInternal(ConnectionManager.java:298)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:184)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:150)
at HbaseConnection.createConnection(HbaseConnection.java:34)
at HbaseConnection.main(HbaseConnection.java:22)
I have gone through various links related to similar problem but I could not find any working solution. Can somebody help with the issue in my configuration or in my code?
I am trying to run a spark job using spark-submit. When I run it in eclipse the job runs without any issue. When I copy the same jar file to a remote machine and run the job there I get the below issue
17/08/09 10:19:15 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, ip-10-50-70-180.ec2.internal): java.io.InvalidClassException: org.apache.spark.executor.TaskMetrics; local class incompatible: stream classdesc serialVersionUID = -2231953621568687904, local class serialVersionUID = -6966587383730940799
at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:616)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1829)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1713)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1986)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2231)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2155)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2013)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
I saw some other links in SO and tried the below
Changed the version of spark jars to 2.11 from 2.10 which I was using before. Now the dependencies in pom look like this
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-yarn_2.10 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-yarn_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.0.2</version>
<scope>provided</scope>
</dependency>
I also checked that the version 2.11-2.0.2 exists in the jars folder of spark as suggested in a few links.
I also added provided in the dependencies as suggested in few links
None of the above helped. Any help would be of great help as I am stuck in this issue. Thanks in advance. Cheers
Edit 1: This is the spark-submit command
spark-submit --deploy-mode cluster --class "com.abc.ingestion.GenericDeviceIngestionSpark" /home/hadoop/sathiya/spark_driven_ingestion-0.0.1-SNAPSHOT-jar-with-dependencies.jar "s3n://input-bucket/input-file.csv" "SIT" "accessToken" "UNKNOWN" "bundleId" "[{"idType":"D_ID","idOrder":1,"isPrimary":true},{"idType":"HASH_DEVICE_ID","idOrder":2,"isPrimary":false}]"
Edit 2:
I also tried adding the variable serialVersionUID = -2231953621568687904L; to the related class but that didn't resolve the issue
I finally resolved the issue. I commented out all the dependencies and uncommented them one at a time. First I uncommented spark_core dependency and the issue got resolved. I uncommented another dependency in my project which again brought back the issue. Then on investigation I found that the second dependency was in turn having dependency of a different version(2.10) of spark_core which was causing the issue. I added exclusion to the dependency as below:
<dependency>
<groupId>com.data.utils</groupId>
<artifactId>data-utils</artifactId>
<version>1.0-SNAPSHOT</version>
<exclusions>
<exclusion>
<groupId>javax.ws.rs</groupId>
<artifactId>javax.ws.rs-api</artifactId>
</exclusion>
<exclusion>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
</exclusion>
</exclusions>
</dependency>
This resolved the issue. Just in case someone gets stuck on this issue. Thanks #JosePraveen for your valuable comment which gave me the hint.
We see this issue when slightly different jar versions were being used on the Spark master and 1 or more of the Spark slaves.
I was facing this issue because I had only copied my jar to the master node. Once I copied the jar to all the slave nodes, my application started working just fine.
Im trying to communicate with hbase using spark. I´m using this code below:
SparkConf sparkConf = new SparkConf().setAppName("HBaseRead");
JavaSparkContext jsc = new JavaSparkContext(sparkConf);
Configuration conf = HBaseConfiguration.create();
conf.addResource(new Path("/etc/hbase/conf/core-site.xml"));
conf.addResource(new Path("/etc/hbase/conf/hbase-site.xml"));
JavaHBaseContext hbaseContext = new JavaHBaseContext(jsc, conf);
Scan scan = new Scan();
scan.setCaching(100);
JavaRDD<Tuple2<ImmutableBytesWritable, Result>> hbaseRdd = hbaseContext.hbaseRDD(TableName.valueOf("climate"), scan);
System.out.println("Number of Records found : " + hbaseRdd.count());
If I execute this, I get the following error:
Exception in thread "dag-scheduler-event-loop" java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/regionserver/StoreFileWriter
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethod(Class.java:2128)
at java.io.ObjectStreamClass.getPrivateMethod(ObjectStreamClass.java:1475)
at java.io.ObjectStreamClass.access$1700(ObjectStreamClass.java:72)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:498)
at java.io.ObjectStreamClass$2.run(ObjectStreamClass.java:472)
at java.security.AccessController.doPrivileged(Native Method)
at java.io.ObjectStreamClass.<init>(ObjectStreamClass.java:472)
at java.io.ObjectStreamClass.lookup(ObjectStreamClass.java:369)
...
I did not find any solution via google. Has anyone an idea?
--------edit--------
I´m using maven. My Pom looks like:
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-server</artifactId>
<version>1.3.0</version>
</dependency>
<dependency>
<groupId>org.sharegov</groupId>
<artifactId>mjson</artifactId>
<version>1.4.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.10</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-csv_2.10</artifactId>
<version>1.5.0</version>
</dependency>
<dependency>
<groupId>com.databricks</groupId>
<artifactId>spark-xml_2.10</artifactId>
<version>0.3.5</version>
</dependency>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-spark</artifactId>
<version>2.0.0-SNAPSHOT</version>
</dependency>
</dependencies>
Im building my application with dependencies using the maven-assembly-plugin
You are getting the NoClassDefFoundError, because spark is not able to find hbase jars in the classpath, you need to supply the required jars to spark-submit explicitly using --jars parameter while launching job:
${SPARK_HOME}/bin/spark-submit \
--jars ${..add hbase jars comma separated...}
--class ....
.........