How to use MultipleInput class in mapreduce?

How to use MultipleInput class in mapreduce? - java

I have a one question.
I need two files as an Input to mapreduce program.
#Override
public int run(String[] args) throws Exception {
(argument skip)
Job job1 = new Job();
job1.setJarByClass(CFRecommenderDriver.class);
job1.setMapperClass(CFRecommenderMapper.class);
//job1.setReducerClass(CFRecommenderReducer.class);
job1.setMapOutputKeyClass(Text.class);
job1.setMapOutputValueClass(TextDoublePairWritableComparable.class);
//job1.setOutputKeyClass(TextTwoWritableComparable.class);
//job1.setOutputValueClass(TextDoubleTwoPairsWritableComparable.class);
MultipleInputs.addInputPath(job1, new Path(args[0]), FileInputFormat.class);
MultipleInputs.addInputPath(job1, new Path(args[1]), FileInputFormat.class);
job1.setNumReduceTasks(0);
boolean step1 = job1.waitForCompletion(true);
if(!(step1)) return -1;
If I'm running program with the following command:
hadoop jar mapreduce-0.1.jar cf /input/cf-re/data1 /input/cf-re/data2 /output/cf-r/data1
I get the following error:
2013-07-01 13:13:44.822 java[45783:1603] Unable to load realm info from SCDynamicStore
13/07/01 13:13:45 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
13/07/01 13:13:45 INFO mapred.JobClient: Cleaning up the staging area hdfs://127.0.0.1:9000/tmp/hadoop- suhyunjeon/mapred/staging/suhyunjeon/.staging/job_201306191432_0218
java.lang.RuntimeException: java.lang.InstantiationException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:115)
at org.apache.hadoop.mapreduce.lib.input.MultipleInputs.getInputFormatMap(MultipleInputs.java:109)
at org.apache.hadoop.mapreduce.lib.input.DelegatingInputFormat.getSplits(DelegatingInputFormat.java:58)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1024)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1041)
at org.apache.hadoop.mapred.JobClient.access$700(JobClient.java:179)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:959)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:912)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:912)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:500)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530)
at org.ankus.hadoop.mapreduce.algorithm.cf.recommender.CFRecommenderDriver.run(CFRecommenderDriver.java:86)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.ankus.hadoop.mapreduce.algorithm.cf.recommender.CFRecommenderDriver.main(CFRecommenderDriver.java:49)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.ankus.hadoop.mapreduce.MapReduceDriver.main(MapReduceDriver.java:64)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
Caused by: java.lang.InstantiationException
at sun.reflect.InstantiationExceptionConstructorAccessorImpl.newInstance(InstantiationExceptionConstructorAccessorImpl.java:30)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
... 29 more
I don't know what exactly the problem. Please help me.

You can not use the abstract class FileInputFormat directly. If your inputs are text, you can use org.apache.hadoop.mapreduce.lib.input.TextInputFormat. For example,
MultipleInputs.addInputPath(job1, new Path(args[0]), TextInputFormat.class);
MultipleInputs.addInputPath(job1, new Path(args[1]), TextInputFormat.class);

This is how you can use multiple input files in your job:
job1.setInputFormatClass(TextInputFormat.class);
FileInputFormat.setInputPaths(job1, input_1 + ","
+ input_2);

I think it's worth mentioning another method that can be used for adding multiple input paths, and in my mind the prettiest and simplest: FileInputFormat.setInputPaths(Job job, Path... inputPaths)
The Path... signature tells you that you can give any number of Path objects to this call. Example:
FileInputFormat.setInputPaths(job, new Path(args[0]), new Path(args[1]), new Path(args[2]));

Related

Java errors when loading an SQL query to test sampleclean

See the quickstart guide on installing sampleclean:
http://sampleclean.org/release.html
I have followed the steps provided in the quick start and have the correct versions:
$ javac -version
javac 1.7.0_95
$ scala
Welcome to Scala version 2.10.5 (OpenJDK 64-Bit Server VM, Java 1.7.0_95)
I wasn't able to install spark-1.2.2, so I got spark-1.6.1 instead.
After building spark, updating the hive conf file and running the scala commands, I get a java error:
~/sampleclean/spark-1.6.1$ ./bin/spark-shell --jars sampleclean-v0.1.jar
scala> import sampleclean.api.SampleCleanContext
import sampleclean.api.SampleCleanContext
scala> val scc = new SampleCleanContext(sc)
scc: sampleclean.api.SampleCleanContext = sampleclean.api.SampleCleanContext#2d15f9f9
scala> scc.hql("CREATE TABLE restaurant(id String, entity String, name String, category String, city String) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\\n'")
java.lang.NoSuchMethodError: sampleclean.api.SampleCleanContext.hql(Ljava/lang/String;)Lorg/apache/spark/sql/DataFrame;
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:40)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:42)
at $iwC$$iwC$$iwC.<init>(<console>:44)
at $iwC$$iwC.<init>(<console>:46)
at $iwC.<init>(<console>:48)
at <init>(<console>:50)
at .<init>(<console>:54)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1346)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
I think I've followed all of the steps correctly. In addition, I can see the hql function defined in sampleclean/api/SampleCleanContext.scala.
Can somebody help to resolve this error.
Thanks!

Cannot Load Java Resources in Spark from Scala

I have a java project containing a class MyFileLoader (among others) that successfully loads a file from resources using:
public static List<String> loadFile() {
Path path = System.class.getResource("/my/path/model.bin").getFile().toPath();
return Files.readAllLines(path, UTF_8);
}
and then does some processing.
After adding this project/jar as a dependency in scala, I tried to access MyFileLoader.loadFile. Unfortunately, this gives a java.lang.NullPointerException, as the resource isn't found.
To debug, I ran this command in spark-shell, showing that this resource indeed exists:
scala> getClass.getResource("/my/path/model.bin").getFile
res32: String = file:/some-local-path/my-jar-with-dependencies.jar!/my/path/model.bin
I then tried:
scala> Files.readAllLines(new File(getClass.getResource("/my/path/model.bin").getPath).toPath)
java.nio.file.NoSuchFileException: file:/some-local-path/my-jar-with-dependencies.jar!/my/path/model.bin
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
at java.nio.file.Files.newByteChannel(Files.java:361)
at java.nio.file.Files.newByteChannel(Files.java:407)
at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384)
at java.nio.file.Files.newInputStream(Files.java:152)
at java.nio.file.Files.newBufferedReader(Files.java:2784)
at java.nio.file.Files.readAllLines(Files.java:3202)
at java.nio.file.Files.readAllLines(Files.java:3242)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:20)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:25)
at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:29)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:31)
at $iwC$$iwC$$iwC.<init>(<console>:33)
at $iwC$$iwC.<init>(<console>:35)
at $iwC.<init>(<console>:37)
at <init>(<console>:39)
at .<init>(<console>:43)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:856)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:901)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:813)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:656)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:664)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:669)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:996)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:944)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:944)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1058)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:569)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:166)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:189)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:110)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Why aren't I able to load the resource these ways?

Since your file is now packaged inside of jar you will need to use Class.getResourceAsStream(). It seems you are trying to read the URL as a regular file which isn't supported (it likely worked before since it wasn't packaged inside of the jar and was able to be loaded as a regular file).

specified destination directory does not exist

I am trying to create a Workflow using the Oozie dashboard provided by the Hue interface. Trying to do it step by step, my workflow only has one java step. The relevant piece of code of this java step is as follows:
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class InputPathsCalculator {
private static final Logger LOGGER = LoggerFactory.getLogger(InputPathsCalculator.class);
public static void main(String[] args) throws IOException {
System.out.println("sout-ing");
LOGGER.info("putting something in the log");
JobConf jobConf = new JobConf();
jobConf.addResource(new Path("/etc/hadoop/conf/hdfs-site.xml"));
Path outputPath = new Path(args[1]);
List<Path> inputPaths = calculateInputPaths(args[0], jobConf);
FileUtil.copy(fileSystem,
inputPaths.toArray(new Path[0]),
fileSystem,
outputPath,
false,
true,
jobConf);
}
}
calculateInputPaths(...) is a method that has been tested in isolation and works just fine. The arguments that I pass to the method are a config file, and a String with the value /usr/myUser/outputs/.
I have two problems here:
1. I can't see anything in any logs. Not what I put into the console, not what I put into the logs
2. The outputs directory exists, but I get the following stack trace:
org.apache.oozie.action.hadoop.JavaMainException: java.io.IOException: `/user/eliasg/outputs/output': specified destination directory does not exist
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:58)
at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:39)
at org.apache.oozie.action.hadoop.JavaMain.main(JavaMain.java:36)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:450)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:163)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: `/user/eliasg/outputs/output': specified destination directory does not exist
at org.apache.hadoop.fs.FileUtil.copy(FileUtil.java:306)
at com.ig.hadoop.jsonextractor.InputPathsCalculator.main(InputPathsCalculator.java:37)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.oozie.action.hadoop.JavaMain.run(JavaMain.java:55)
... 15 more
I have the feeling that for point 2, my jobConfig is missing something that will let it work with hdfs, but I don't know what. About point 1, I am completely lost.

I was looking at the wrong place. The logs were there, but apparently I needed to look at the map task container logs instead of the list of apps log.
I have discovered that adding hdfs-site.xml to the jobConf is not enough. There is a property that is not enabled by default and needs to be set. That property is:
jobConf.set("fs.default.name", String.format("hdfs://%1$s", jobConf.get("dfs.nameservices")));

Mongo Hadoop Connector Issue

I am trying to run a MapReduce job: I pull from Mongo and then write to HDFS, but I cannot seem to get the job to run. I could not find an example, but the issues I am having that if I set an input path of Mongo it loos for the output path of Mongo. And now I am getting an authentication error when my MongoDB instance does not have authentication.
final Configuration conf = getConf();
final Job job = new Job(conf, "sort");
MongoConfig config = new MongoConfig(conf);
MongoConfigUtil.setInputFormat(getConf(), MongoInputFormat.class);
FileOutputFormat.setOutputPath(job, new Path("/trythisdir"));
MongoConfigUtil.setInputURI(conf,"mongodb://localhost:27017/fake_data.file");
//conf.set("mongo.output.uri", "mongodb://localhost:27017/fake_data.file");
job.setJarByClass(imageExtractor.class);
job.setMapperClass(imageExtractorMapper.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setInputFormatClass( MongoInputFormat.class );
// Execute job and return status
return job.waitForCompletion(true) ? 0 : 1;
Edit: This is the current error I am having:
Exception in thread "main" java.lang.IllegalArgumentException: Couldn't connect and authenticate to get collection
at com.mongodb.hadoop.util.MongoConfigUtil.getCollection(MongoConfigUtil.java:353)
at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitterByStats(MongoSplitterFactory.java:71)
at com.mongodb.hadoop.splitter.MongoSplitterFactory.getSplitter(MongoSplitterFactory.java:107)
at com.mongodb.hadoop.MongoInputFormat.getSplits(MongoInputFormat.java:56)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1079)
at org.apache.hadoop.mapred.JobClient.writeSplits(JobClient.java:1096)
at org.apache.hadoop.mapred.JobClient.access$600(JobClient.java:177)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:995)
at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:948)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:948)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:566)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:596)
at com.orbis.image.extractor.mongo.imageExtractor.run(imageExtractor.java:103)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at com.orbis.image.extractor.mongo.imageExtractor.main(imageExtractor.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.NullPointerException
at com.mongodb.MongoURI.<init>(MongoURI.java:148)
at com.mongodb.MongoClient.<init>(MongoClient.java:268)
at com.mongodb.hadoop.util.MongoConfigUtil.getCollection(MongoConfigUtil.java:351)
... 22 more

Late answer.. It may be helpul for people. I encountered with same problem while playing with Apache Spark.
I think you should set correctly mongo.input.uri and mongo.output.uri which will be used by hadoop and also input and output formats.
/*Correct input and output uri setting on spark(hadoop)*/
conf.set("mongo.input.uri", "mongodb://localhost:27017/dbName.inputColName");
conf.set("mongo.output.uri", "mongodb://localhost:27017/dbName.outputColName");
/*Set input and output formats*/
job.setInputFormatClass( MongoInputFormat.class );
job.setOutputFormatClass( MongoOutputFormat.class )
Btw, if "mongo.input.uri" or "mongo.output.uri" strings have typos it causes same error.

Replace:
MongoConfigUtil.setInputURI(conf, "mongodb://localhost:27017/fake_data.file");
by:
MongoConfigUtil.setInputURI(job.getConfiguration(), "mongodb://localhost:27017/fake_data.file");
The conf object is already 'consumed' by your job, so you need to set it directly on the configuration of the job.

You haven't shared the complete code so it's hard to tell, but what you've got there does not look consistent with typical usage of the MongoDB Connector for Hadoop.
I would suggest that you start with the examples in github.

Error attempting to get var names for namespace in cljunit

I'm working with a project that contains Java classes and Clojure files. The objective is to test Clojure files using java.
I'm using Cljunit for this purpose: https://github.com/mikera/cljunit
The code I use is next:
public class DemoClojureTest extends ClojureTest {
#Override
public List<String> namespaces() {
#SuppressWarnings("unused") ArrayList<String> ns=new ArrayList<String>();
ns.add("com.example.demo.helloWorld");
return ns;
}
}
And the clojure file (helloWorld.clj) is:
(ns com.example.demo.helloWorld
(:use clojure.test))
(deftest test1
(is (= 1 3)))
(deftest test2
(is (= 2 2)))
When I try to execute the DemoClojureTest I get this error:
Error attempting to get var names for namespace [com.example.demo.helloWorld]
java.io.FileNotFoundException: Could not locate com/example/demo/helloWorld__init.class or com/example/demo/helloWorld.clj on classpath:
at clojure.lang.RT.load(RT.java:432)
at clojure.lang.RT.load(RT.java:400)
at clojure.core$load$fn__4890.invoke(core.clj:5415)
at clojure.core$load.doInvoke(core.clj:5414)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at clojure.core$load_one.invoke(core.clj:5227)
at clojure.core$load_lib.doInvoke(core.clj:5264)
at clojure.lang.RestFn.applyTo(RestFn.java:142)
at clojure.core$apply.invoke(core.clj:603)
at clojure.core$load_libs.doInvoke(core.clj:5298)
at clojure.lang.RestFn.applyTo(RestFn.java:137)
at clojure.core$apply.invoke(core.clj:603)
at clojure.core$require.doInvoke(core.clj:5381)
at clojure.lang.RestFn.invoke(RestFn.java:408)
at mikera.cljunit.core$get_test_var_names.invoke(core.clj:67)
at clojure.lang.Var.invoke(Var.java:415)
at mikera.cljunit.Clojure.getTestVars(Clojure.java:29)
at mikera.cljunit.NamespaceTester.<init>(NamespaceTester.java:19)
at mikera.cljunit.ClojureTester.<init>(ClojureTester.java:21)
at mikera.cljunit.ClojureRunner.<init>(ClojureRunner.java:16)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at org.junit.internal.builders.AnnotatedBuilder.buildRunner(AnnotatedBuilder.java:29)
at org.junit.internal.builders.AnnotatedBuilder.runnerForClass(AnnotatedBuilder.java:21)
at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at org.junit.internal.builders.AllDefaultPossibilitiesBuilder.runnerForClass(AllDefaultPossibilitiesBuilder.java:26)
at org.junit.runners.model.RunnerBuilder.safeRunnerForClass(RunnerBuilder.java:59)
at org.junit.internal.requests.ClassRequest.getRunner(ClassRequest.java:26)
at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:44)
at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:195)
at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:63)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
What I'm doing wrong?

From the stack trace, it looks like the IntelliJ test runner is either running your tests with a classpath that does not include your Clojure source files, or not including them in the build.
Make sure the folder, your Clojure files are in, is under a "Content Root" and marked as a "Sources", or "Test Sources", folder in the module settings.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

How to use MultipleInput class in mapreduce? - java

This is how you can use multiple input files in your job: job1.setInputFormatClass(TextInputFormat.class); FileInputFormat.setInputPaths(job1, input_1 + "," + input_2);

Related

Java errors when loading an SQL query to test sampleclean

Cannot Load Java Resources in Spark from Scala

specified destination directory does not exist

Mongo Hadoop Connector Issue

Error attempting to get var names for namespace in cljunit

Categories

Resources