TApplicationException exception when running a mapreduce job on an Accumulo Table - java

I am running a map reduce job taking data from a table in Accumulo as the input and storing the result in another table in Accumulo. To do this, I am using the AccumuloInputFormat and AccumuloOutputFormat classes. Here is the code
public int run(String[] args) throws Exception {
Opts opts = new Opts();
opts.parseArgs(PivotTable.class.getName(), args);
Configuration conf = getConf();
conf.set("formula", opts.formula);
Job job = Job.getInstance(conf);
job.setJobName("Pivot Table Generation");
job.setJarByClass(PivotTable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setMapperClass(PivotTableMapper.class);
job.setCombinerClass(PivotTableCombiber.class);
job.setReducerClass(PivotTableReducer.class);
job.setInputFormatClass(AccumuloInputFormat.class);
ClientConfiguration zkConfig = new ClientConfiguration().withInstance(opts.getInstance().getInstanceName()).withZkHosts(opts.getInstance().getZooKeepers());
AccumuloInputFormat.setInputTableName(job, opts.dataTable);
AccumuloInputFormat.setZooKeeperInstance(job, zkConfig);
AccumuloInputFormat.setConnectorInfo(job, opts.getPrincipal(), new PasswordToken(opts.getPassword().value));
job.setOutputFormatClass(AccumuloOutputFormat.class);
BatchWriterConfig bwConfig = new BatchWriterConfig();
AccumuloOutputFormat.setBatchWriterOptions(job, bwConfig);
AccumuloOutputFormat.setZooKeeperInstance(job, zkConfig);
AccumuloOutputFormat.setConnectorInfo(job, opts.getPrincipal(), new PasswordToken(opts.getPassword().value));
AccumuloOutputFormat.setDefaultTableName(job, opts.pivotTable);
AccumuloOutputFormat.setCreateTables(job, true);
return job.waitForCompletion(true) ? 0 : 1;
}
PivotTable is the name of the class that contains the main method (and this one too). I have made the mapper, combiner and reducer classes as well. But when I try to exectute this job, I get an error
Exception in thread "main" java.io.IOException: org.apache.accumulo.core.client.AccumuloException: org.apache.thrift.TApplicationException: Internal error processing hasTablePermission
at org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validatePermissions(InputConfigurator.java:707)
at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.validateOptions(AbstractInputFormat.java:397)
at org.apache.accumulo.core.client.mapreduce.AbstractInputFormat.getSplits(AbstractInputFormat.java:668)
at org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:301)
at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:318)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1308)
at com.latize.ulysses.accumulo.postprocess.PivotTable.run(PivotTable.java:247)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.latize.ulysses.accumulo.postprocess.PivotTable.main(PivotTable.java:251)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Caused by: org.apache.accumulo.core.client.AccumuloException: org.apache.thrift.TApplicationException: Internal error processing hasTablePermission
at org.apache.accumulo.core.client.impl.SecurityOperationsImpl.execute(SecurityOperationsImpl.java:87)
at org.apache.accumulo.core.client.impl.SecurityOperationsImpl.hasTablePermission(SecurityOperationsImpl.java:220)
at org.apache.accumulo.core.client.mapreduce.lib.impl.InputConfigurator.validatePermissions(InputConfigurator.java:692)
... 21 more
Caused by: org.apache.thrift.TApplicationException: Internal error processing hasTablePermission
at org.apache.thrift.TApplicationException.read(TApplicationException.java:111)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:71)
at org.apache.accumulo.core.client.impl.thrift.ClientService$Client.recv_hasTablePermission(ClientService.java:641)
at org.apache.accumulo.core.client.impl.thrift.ClientService$Client.hasTablePermission(ClientService.java:624)
at org.apache.accumulo.core.client.impl.SecurityOperationsImpl$8.execute(SecurityOperationsImpl.java:223)
at org.apache.accumulo.core.client.impl.SecurityOperationsImpl$8.execute(SecurityOperationsImpl.java:220)
at org.apache.accumulo.core.client.impl.ServerClient.executeRaw(ServerClient.java:79)
at org.apache.accumulo.core.client.impl.SecurityOperationsImpl.execute(SecurityOperationsImpl.java:73)
Can someone tell me what am I doing wrong here? Any help would be appreciated.
EDIT : I am running Accumulo 1.7.0

A TApplicationException indicates the error occurred on the Accumulo tablet server, rather than in your client (MapReduce) code. You'll need to examine your tablet server logs to get more information about the particular error wherever you see TApplicationException.
Table permissions are usually retrieved from ZooKeeper, so it may indicate a problem with the tserver connecting to ZooKeeper.
Unfortunately, I don't see the hostname or the IP in the stack trace, so you may have to check all the tserver logs to find it.

Related

Using drools in a Spark job

I am trying to use drools in spark job submitted to a cluster. THe job will start by getting the drools jar from a drools server then initialize the base and sessions.
My code work when executed in Spark but when submitting to spark cluster a NPE happens.
This is how I am doing
String url = "{my server address}/drools-wb/maven2/com/myspace/Project1/1.0.0/Project1-1.0.0.jar";
KieServices ks = KieServices.Factory.get();
//ERROR is in the below line
ReleaseId releaseId = ks.newReleaseId("com.myspace", "Project1", "1.0.0");
KieRepository kr = ks.getRepository();
UrlResource urlResource = (UrlResource) ks.getResources().newUrlResource(url);
The error shown after submitting the code:
Exception in thread "main" java.lang.NullPointerException
at org.opencell.spark.jobs.TestWithDrools.main(TestWithDrools.java:47)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:879)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:197)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:227)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:136)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2018-08-29 10:08:09 INFO ShutdownHookManager:54 - Shutdown hook called
Do you have an idea about solving this issue ?
The issue is caused because ks is null.
So to resolve this issue please refer to this post:
Drools 7.4.1 kieservices.factory.get() returns null

Spark possible race condition in driver

I have a Spark job that processes several folders on S3 per run and stores its state on DynamoDB. In other words, we're running the job once per day, it looks for new folders added by another job, transforms them one-by-one and writes state to DynamoDB. Here's rough pseudocode:
object App {
val allFolders = S3Folders.list()
val foldersToProcess = DynamoDBState.getFoldersToProcess(allFolders)
Transformer.run(foldersToProcess)
}
object Transformer {
def run(folders: List[String]): Unit = {
val sc = new SparkContext()
folders.foreach(process(sc, _))
}
def process(sc: SparkContext, folder: String): Unit = ??? // transform and write to S3
}
This approach works well if S3Folders.list() returns relatively small amount of folders (up to few thousands), if it returns more (4-8K) very often we see following error (that in first glance has nothing to do with Spark):
17/10/31 08:38:20 ERROR ApplicationMaster: User class threw exception: shadeaws.SdkClientException: Failed to sanitize XML document destined for handler class shadeaws.services.s3.model.transform.XmlResponses
SaxParser$ListObjectsV2Handler
shadeaws.SdkClientException: Failed to sanitize XML document destined for handler class shadeaws.services.s3.model.transform.XmlResponsesSaxParser$ListObjectsV2Handler
at shadeaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:214)
at shadeaws.services.s3.model.transform.XmlResponsesSaxParser.parseListObjectsV2Response(XmlResponsesSaxParser.java:315)
at shadeaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:88)
at shadeaws.services.s3.model.transform.Unmarshallers$ListObjectsV2Unmarshaller.unmarshall(Unmarshallers.java:77)
at shadeaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
at shadeaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:31)
at shadeaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at shadeaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1553)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1271)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1055)
at shadeaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:743)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:717)
at shadeaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at shadeaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at shadeaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at shadeaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at shadeaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4247)
at shadeaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4194)
at shadeaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:4188)
at shadeaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:865)
at me.chuwy.transform.S3Folders$.com$chuwy$transform$S3Folders$$isGlacierified(S3Folders.scala:136)
at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:267)
at scala.collection.AbstractTraversable.filterNot(Traversable.scala:104)
at me.chuwy.transform.S3Folders$.list(S3Folders.scala:112)
at me.chuwy.transform.Main$.main(Main.scala:22)
at me.chuwy.transform.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
Caused by: shadeaws.AbortedException:
at shadeaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:53)
at shadeaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:81)
at shadeaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.read1(BufferedReader.java:210)
at java.io.BufferedReader.read(BufferedReader.java:286)
at java.io.Reader.read(Reader.java:140)
at shadeaws.services.s3.model.transform.XmlResponsesSaxParser.sanitizeXmlDocument(XmlResponsesSaxParser.java:186)
... 36 more
For big amount of folders (~20K) this happens all the time and job cannot start.
Previously we had very similar, but much more frequent error when getFoldersToProcess did GetItem for every folder from allFolders and therefore took much longer:
17/09/30 14:46:07 ERROR ApplicationMaster: User class threw exception: shadeaws.AbortedException:
shadeaws.AbortedException:
at shadeaws.internal.SdkFilterInputStream.abortIfNeeded(SdkFilterInputStream.java:51)
at shadeaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:71)
at shadeaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.ensureLoaded(ByteSourceJsonBootstrapper.java:489)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.detectEncoding(ByteSourceJsonBootstrapper.java:126)
at com.fasterxml.jackson.core.json.ByteSourceJsonBootstrapper.constructParser(ByteSourceJsonBootstrapper.java:215)
at com.fasterxml.jackson.core.JsonFactory._createParser(JsonFactory.java:1240)
at com.fasterxml.jackson.core.JsonFactory.createParser(JsonFactory.java:802)
at shadeaws.http.JsonResponseHandler.handle(JsonResponseHandler.java:109)
at shadeaws.http.JsonResponseHandler.handle(JsonResponseHandler.java:43)
at shadeaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
at shadeaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1503)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1226)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1030)
at shadeaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:742)
at shadeaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:716)
at shadeaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:699)
at shadeaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:667)
at shadeaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:649)
at shadeaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:513)
at shadeaws.services.dynamodbv2.AmazonDynamoDBClient.doInvoke(AmazonDynamoDBClient.java:2089)
at shadeaws.services.dynamodbv2.AmazonDynamoDBClient.invoke(AmazonDynamoDBClient.java:2065)
at shadeaws.services.dynamodbv2.AmazonDynamoDBClient.executeGetItem(AmazonDynamoDBClient.java:1173)
at shadeaws.services.dynamodbv2.AmazonDynamoDBClient.getItem(AmazonDynamoDBClient.java:1149)
at me.chuwy.tranform.sdk.Manifest$.contains(Manifest.scala:179)
at me.chuwy.tranform.DynamoDBState$$anonfun$getUnprocessed$1.apply(ProcessManifest.scala:44)
at scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
at scala.collection.TraversableLike$class.filterNot(TraversableLike.scala:267)
at scala.collection.AbstractTraversable.filterNot(Traversable.scala:104)
at me.chuwy.transform.DynamoDBState$.getFoldersToProcess(DynamoDBState.scala:44)
at me.chuwy.transform.Main$.main(Main.scala:19)
at me.chuwy.transform.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)
I believe that current error has nothing to do with XML parsing or invalid response, but originate from some race condition inside Spark, because:
There's clear connection between amount of time "state-fetching" takes and chance of failure
Tracebacks have underlying AbortedException, which AFAIK caused by swallowed InterruptedException, which can mean something inside JVM (spark-submit or even YARN) calls Thread.sleep for main thread.
Right now I'm using EMR AMI 5.5.0, Spark 2.1.0 and shaded AWS SDK 1.11.208, but had similar error with AWS SDK 1.10.75.
I'm deploying this job on EMR via command-runner.jar spark-submit --deploy-mode cluster --class ....
Does anyone have any idea where does this exception originate from and how to fix it?
foreach does not guarantee orderly computations and it applies the operation(s) to each element of an RDD, meaning that it will instantiate for every element which, in turn, may overwhelm the executor.
The problem was that getFoldersToProcess is a blocking (and very long) operation, which prevents SparkContext from being instantiated. SpackContext itself should signal about own instantiation to YARN and if it doesn't help in a certain amount of time - YARN assumes that driver node has fallen off and kills the whole cluster.

Mapreduce job spitting java.io.IOException: com.mysql.jdbc.Driver

hadoop - 2.7.3
I am creating a mapreduce job that reads data from HDFS input file and writes data to mysql.
It is throwing an error while initiating connection. There is no additional information like connection refused or classNotFound exception.Simple IO exception and it is not making any sense to me.
Error: java.io.IOException: com.mysql.jdbc.Driver
at org.apache.hadoop.mapreduce.lib.db.DBOutputFormat.getRecordWriter(DBOutputFormat.java:185)
at org.apache.hadoop.mapred.ReduceTask$NewTrackingRecordWriter.<init>(ReduceTask.java:540)
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:614)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:389)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
My mapreduce code:
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",
"jdbc:mysql://localhost:3306/db",
"user",
"password");
Job job = Job.getInstance(conf, "test");
job.setJar(DBMapReduce.class);
job.setMapperClass(DbMapper.class);
job.setReducerClass(DbSQLReducer.class);
job.setMapOutputKeyClass(DBKeyWritable.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(DBOutputWritable.class);
job.setOutputValueClass(NullWritable.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(DBOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[1]));
DBOutputFormat.setOutput(
job,
"table_name", // output table name
new String[] { "dummy",
"code",
"code_type"
} //table columns
);
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
I have mysql-connector added to classpath,HADOOP_CLASSPATH, -libjars, referenced libraries and lib folder. None of these seem to work.
any help would be greatly appreciated.
This looks like a permissions issue when looking at this particular stack trace:
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
I guess there isn't sufficient write access for the yarn users to the temp directory.
Hope this url help out to track the issue :
https://examples.javacodegeeks.com/core-java/io/ioexception/java-io-ioexception-how-to-solve-ioexception/
Track the issue by using try and catch then might be possible to fix.

java.lang.NoSuchMethodError in netty-socketio server

I'm trying to start netty-socketio server, and I can't trace origins of this exception. I have marked in stacktrace place where it may lead to the answer, so if anyone could provide explanation on this it will be much appreciated.
public class SocketIoServer {
private Configuration cnf = new Configuration();
private SocketIOServer server;
public SocketIoServer() {
Configuration config = new Configuration();
config.setHostname("localhost");
config.setPort(8081);
server = new SocketIOServer(config);
server.start();
}
}
When I initialize socket an Exception gets thrown. Here's context:
Exception in thread "main" java.lang.IllegalArgumentException:
java.lang.reflect.InvocationTargetException
at com.corundumstudio.socketio.Configuration.<init>(Configuration.java:112)
at com.corundumstudio.socketio.SocketIOServer.<init>(SocketIOServer.java:66)
at SocketIoServer.<init>(SocketIoServer.java:17)
at Server.main(Server.java:19)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at com.corundumstudio.socketio.Configuration.<init>(Configuration.java:109)
... 8 more
This line in particular
Caused by: java.lang.NoSuchMethodError:
com.fasterxml.jackson.databind.module.SimpleModule.setSerializerModifier(Lcom/fasterxml/jackson/databind/ser/BeanSerializerModifier;)Lcom/fasterxml/jackson/databind/module/SimpleModule;
at com.corundumstudio.socketio.protocol.JacksonJsonSupport.init(JacksonJsonSupport.java:316)
at com.corundumstudio.socketio.protocol.JacksonJsonSupport.<init>(JacksonJsonSupport.java:311)
at com.corundumstudio.socketio.protocol.JacksonJsonSupport.<init>(JacksonJsonSupport.java:304)
... 13 more
You have obviously a conflict of version of jackson-databind in your project, indeed the class com.corundumstudio.socketio.protocol.JacksonJsonSupport relies on the method SimpleModule#setSerializerModifier(BeanSerializerModifier mod) which has been added since the version 2.2 so as it cannot find this method, it means that you have a version of jackson-databind older than 2.2 in your classpath such that the method cannot be found.
Check all the jars that you have in your classpath and make sure that you have only one version of jackson-databind corresponding to the version expected by netty-socketio. For example assuming that you use the version 1.7.12 of netty-socketio, the expected version of jackson-databind is 2.7.4 as you can see here.

Spark RDD create on s3 file

I'm trying to create JAVARDD on s3 file but not able to create rdd.Can someone help me to solve this problem.
Code :
SparkConf conf = new SparkConf().setAppName(appName).setMaster("local");
JavaSparkContext javaSparkContext = new JavaSparkContext(conf);
javaSparkContext.hadoopConfiguration().set("fs.s3.awsAccessKeyId",
accessKey);
javaSparkContext.hadoopConfiguration().set("fs.s3.awsSecretAccessKey",
secretKey);
javaSparkContext.hadoopConfiguration().set("fs.s3.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem");
JavaRDD<String> rawData = sparkContext
.textFile("s3://mybucket/sample.txt");
This code throwing exception
2015-05-06 18:58:57 WARN LoadSnappy:46 - Snappy native library not loaded
java.lang.IllegalArgumentException: java.net.URISyntaxException: Expected scheme-specific part at index 3: s3:
at org.apache.hadoop.fs.Path.initialize(Path.java:148)
at org.apache.hadoop.fs.Path.<init>(Path.java:126)
at org.apache.hadoop.fs.Path.<init>(Path.java:50)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1084)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globPathsLevel(FileSystem.java:1087)
at org.apache.hadoop.fs.FileSystem.globStatusInternal(FileSystem.java:1023)
at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:987)
at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:177)
at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:208)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:203)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:32)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:219)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:217)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:217)
at org.apache.spark.rdd.RDD.take(RDD.scala:1156)
at org.apache.spark.rdd.RDD.first(RDD.scala:1189)
at org.apache.spark.api.java.JavaRDDLike$class.first(JavaRDDLike.scala:477)
at org.apache.spark.api.java.JavaRDD.first(JavaRDD.scala:32)
at com.cignifi.DataExplorationValidation.processFile(DataExplorationValidation.java:148)
at com.cignifi.DataExplorationValidation.main(DataExplorationValidation.java:104)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.net.URISyntaxException: Expected scheme-specific part at index 3: s3:
at java.net.URI$Parser.fail(URI.java:2829)
at java.net.URI$Parser.failExpecting(URI.java:2835)
at java.net.URI$Parser.parse(URI.java:3038)
at java.net.URI.<init>(URI.java:753)
at org.apache.hadoop.fs.Path.initialize(Path.java:145)
... 36 more
Some more details
Spark version 1.3.0.
Running in local mode using spark-submit.
I tried this thing on local and EC2 instance ,In both case I'm getting same error.
It should be s3n:// instead of s3://
See External Datasets in Spark Programming Guide

Categories

Resources