I am trying to calculate average of numbers in hadoop stand alone setup. I am not able to run the program. But program compile without any error and jar file also created.I think I am using correct commands to execute the program in hadoop set up. Somebody please review my code and tell me is there any problem . Here is my code
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
class sum_count{
int sum;
int count;
}
public class Average {
public static class TokenizerMapper
extends Mapper<Object, Text, Text, Object>{
private final static IntWritable valueofkey = new IntWritable();
private Text word = new Text();
sum_count sc=new sum_count();
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {
StringTokenizer itr = new StringTokenizer(value.toString());
int sum=0;
int count=0;
int v;
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
v=Integer.parseInt(word.toString());
count=count+1;
sum=sum+v;
}
//valueofkey.set(sum);
word.set("average");
sc.sum=sum;
sc.count=count;
// context.write(word, valueofkey);
// valueofkey.set(count);
// word.set("count");
context.write(word,sc);
}
}
public static class IntSumReducer
extends Reducer<Text,Object,Text,IntWritable> {
private IntWritable result = new IntWritable();
private IntWritable test=new IntWritable();
public void reduce(Text key, Iterable<sum_count> values,Context context) throws IOException, InterruptedException {
int sum = 0;
int count=0;
int wholesum=0;
int wholecount=0;
for (sum_count val : values) {
//value=val.get();
wholesum=wholesum+val.sum;
wholecount=wholecount+val.count;
}
int res=wholesum/wholecount;
result.set(res);
context.write(key, result );
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "");
job.setJarByClass(Average.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Here is my output:
manu#manu-Latitude-E5430-vPro:~/hadoop-2.7.2$ ./bin/hadoop jar av.jar Average bin/user/hduser/input bin/user/hduser/out12
Picked up JAVA_TOOL_OPTIONS: -javaagent:/usr/share/java/jayatanaag.jar
16/07/01 11:19:05 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
16/07/01 11:19:05 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
16/07/01 11:19:05 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
16/07/01 11:19:05 INFO input.FileInputFormat: Total input paths to process : 2
16/07/01 11:19:05 INFO mapreduce.JobSubmitter: number of splits:2
16/07/01 11:19:05 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local276107063_0001
16/07/01 11:19:05 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
16/07/01 11:19:05 INFO mapreduce.Job: Running job: job_local276107063_0001
16/07/01 11:19:05 INFO mapred.LocalJobRunner: OutputCommitter set in config null
16/07/01 11:19:05 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/07/01 11:19:05 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
16/07/01 11:19:05 INFO mapred.LocalJobRunner: Waiting for map tasks
16/07/01 11:19:05 INFO mapred.LocalJobRunner: Starting task: attempt_local276107063_0001_m_000000_0
16/07/01 11:19:06 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/07/01 11:19:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/07/01 11:19:06 INFO mapred.LocalJobRunner: Starting task: attempt_local276107063_0001_m_000001_0
16/07/01 11:19:06 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/07/01 11:19:06 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/07/01 11:19:06 INFO mapred.LocalJobRunner: map task executor complete.
16/07/01 11:19:06 WARN mapred.LocalJobRunner: job_local276107063_0001
java.lang.Exception: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:134)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:745)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:132)
... 8 more
Caused by: java.lang.NoClassDefFoundError: sum_count
at Average$TokenizerMapper.<init>(Average.java:24)
... 13 more
Caused by: java.lang.ClassNotFoundException: sum_count
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 14 more
16/07/01 11:19:06 INFO mapreduce.Job: Job job_local276107063_0001 running in uber mode : false
16/07/01 11:19:06 INFO mapreduce.Job: map 0% reduce 0%
16/07/01 11:19:06 INFO mapreduce.Job: Job job_local276107063_0001 failed with state FAILED due to: NA
16/07/01 11:19:06 INFO mapreduce.Job: Counters: 0
You're getting a ClassNotFoundException on sum_count. Having two classes declared at the top level of a file isnt really a good way to structure your code. It looks like when the TokenizerMapper tries to create that class, it can't find it on the class path.
I would just put that class in a file of its own. It will need changing anyway, your job won't work as you have it since sum_count doesnt implement the Writable interface. It should look more like:
public class SumCount implements Writable {
public int sum;
public int count;
#Override
public void write(DataOutput out) throws IOException {
out.writeInt(sum);
out.writeInt(count);
}
#Override
public void readFields(DataInput in) throws IOException {
sum = in.readInt();
count = in.readInt();
}
}
In your main() you also need to tell it what types of Key/Value it will write out are:
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(SumCount.class);
Note the change in class name. See the Java naming convention docs here.
Related
I want to use MapReduce to get the max value and min value for each year in a txt file. the contents in the file look like this:
1979 23 23 2 43 24 25 26 26 26 26 25 26
1980 26 27 28 28 28 30 31 31 31 30 30 30
1981 31 32 32 32 33 34 35 36 36 34 34 34
1984 39 38 39 39 39 41 42 43 40 39 38 38
1985 38 39 39 39 39 41 41 41 00 40 39 39
The first column represents years.
I want MapReduce to give me a final output like this:
1979 2, 26
1980 26, 31
...
so I write the code in Java like this:
public class MaxValue_MinValue {
public static class E_Mappter extends Mapper<Object, Text, Text, IntWritable> {
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] elements = line.split("\\s");
Text outputKey = new Text(elements[0]);
for(int i = 1; i<elements.length;i++) {
context.write(outputKey, new IntWritable(Integer.parseInt(elements[i])));
}
}
}
public static class E_Reducer extends Reducer<Text,IntWritable, Text, Text> {
public void reduce(Text inKey,Iterable<IntWritable> inValues, Context context) throws IOException, InterruptedException {
int maxTemp = 0;
int minTemp = 0;
for(IntWritable ele : inValues) {
if (ele.get() > maxTemp) {
maxTemp = ele.get();
}
if (ele.get() < minTemp) {
minTemp = ele.get();
}
}
context.write(inKey, new Text("Max is " + maxTemp + ", Min is " + minTemp));
}
}
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf,"Max value, min value for each year");
job.setJarByClass(MaxValue_MinValue.class);
job.setMapperClass(E_Mappter.class);
job.setReducerClass(E_Reducer.class);
job.setCombinerClass(E_Reducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0:1);
}
}
but when I run it, I got below error messages:
hadoop#steven81-HP:/usr/local/hadoop277$ ./bin/hadoop jar ./myApp/MinValue_MaxValue.jar /user/hadoop/input/Electrical__Consumption.txt /user/hadoop/output7
19/04/10 16:59:21 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
19/04/10 16:59:21 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
19/04/10 16:59:21 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
19/04/10 16:59:22 INFO input.FileInputFormat: Total input paths to process : 1
19/04/10 16:59:22 INFO mapreduce.JobSubmitter: number of splits:1
19/04/10 16:59:22 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1076320101_0001
19/04/10 16:59:23 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
19/04/10 16:59:23 INFO mapreduce.Job: Running job: job_local1076320101_0001
19/04/10 16:59:23 INFO mapred.LocalJobRunner: OutputCommitter set in config null
19/04/10 16:59:23 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/04/10 16:59:23 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
19/04/10 16:59:23 INFO mapred.LocalJobRunner: Waiting for map tasks
19/04/10 16:59:23 INFO mapred.LocalJobRunner: Starting task: attempt_local1076320101_0001_m_000000_0
19/04/10 16:59:23 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
19/04/10 16:59:23 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
19/04/10 16:59:23 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/user/hadoop/input/Electrical__Consumption.txt:0+204
19/04/10 16:59:23 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
19/04/10 16:59:23 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
19/04/10 16:59:23 INFO mapred.MapTask: soft limit at 83886080
19/04/10 16:59:23 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
19/04/10 16:59:23 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
19/04/10 16:59:23 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
19/04/10 16:59:24 INFO mapred.MapTask: Starting flush of map output
19/04/10 16:59:24 INFO mapred.LocalJobRunner: map task executor complete.
19/04/10 16:59:24 INFO mapreduce.Job: Job job_local1076320101_0001 running in uber mode : false
19/04/10 16:59:24 INFO mapreduce.Job: map 0% reduce 0%
19/04/10 16:59:24 WARN mapred.LocalJobRunner: job_local1076320101_0001
java.lang.Exception: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:1077)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.write(MapTask.java:715)
at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89)
at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112)
at test.map.reduce.MaxValue_MinValue$E_Mappter.map(MaxValue_MinValue.java:23)
at test.map.reduce.MaxValue_MinValue$E_Mappter.map(MaxValue_MinValue.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
19/04/10 16:59:25 INFO mapreduce.Job: Job job_local1076320101_0001 failed with state FAILED due to: NA
19/04/10 16:59:25 INFO mapreduce.Job: Counters: 0
I was confused by this error "Type mismatch in value from map: expected org.apache.hadoop.io.Text, received org.apache.hadoop.io.IntWritable" because the map's output is (Text, IntWritable) and the input for the reduce is also (Text, IntWritable), so I don't know why, can anyone help me?
The Combiner must be able to accept data from the Mapper, and must output data that can be used as input for the Reducer. In your case, the Combiner output type is <Text, Text>, but the Reducer input type is <Text, IntWritable> and so they don't match.
You don't actually need MapReduce for this problem, because you have all the data for each year available on each line, and you don't need to compare between lines.
String line = value.toString();
String[] elements = line.split("\\s");
Text year = new Text(elements[0]);
int maxTemp = INTEGER.MIN_VALUE;
int minTemp = INTEGER.MAX_VALUE;
int temp;
for(int i = 1; i<elements.length;i++) {
temp = Integer.parseInt(elements[i])
if (temp < minTemp) {
minTemp = temp;
} else if (temp > maxTemp) {
maxTemp = temp;
}
}
System.out.println("For year " + year + ", the minimum temperature was " + minTemp + " and the maximum temperature was " + maxTemp);
I have a context.write(...) method in my reducer function but it don't write anything. The weird thing is that the System.out.println(...) just above work fine and print the desired result (like you can see on the following screen) :
Image of the System.out.println trace
Here is the complete code :
public class Jointure {
public static class TokenizerMapper extends Mapper<Object, Text, IntWritable, Text> {
private boolean tab2 = false; // true quand iteration sur les lignes arrive au tab2
public void map(Object key, org.apache.hadoop.io.Text value, Context context)
throws IOException, InterruptedException {
Arrays.stream(value.toString().split("\\r?\\n")).forEach(line -> { // iterer sur chaque ligne du fichier
// input
if ((!tab2) && (!line.equals(""))) { // si ligne dans tab1
String[] parts = line.split(";");
int idtoWrite = Integer.parseInt(parts[0]);
String valueToWrite = parts[1] + ";Table1";
try {
context.write(new IntWritable(idtoWrite), new Text(valueToWrite)); // creer un couple cle/valeur
// en output
} catch (Exception e) {
}
} else if (line.equals("")) { // si séparation des deux tabs
tab2 = true;
} else if (tab2 && (!line.equals(""))) { // si ligne dans tab2
String[] parts = line.split(";");
int idtoWrite = Integer.parseInt(parts[0]);
String valueToWrite = parts[1] + ";Table2";
try {
context.write(new IntWritable(idtoWrite), new Text(valueToWrite)); // creer un couple cle/valeur
// en output
} catch (Exception e) {
}
}
});
}
}
public static class IntSumReducer extends Reducer<IntWritable, Text, IntWritable, Text> {
public void reduce(IntWritable key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
ArrayList<String> listPrenom = new ArrayList<String>();
ArrayList<String> listPays = new ArrayList<String>();
for (Text val : values) {
String[] parts = val.toString().split(";");
String nomOuPays = parts[0];
String table = "";
try {
table = parts[1];
} catch (Exception e) {
}
if (table.equals("Table1")) {
listPrenom.add(nomOuPays);
} else if (table.equals("Table2")) {
listPays.add(nomOuPays);
}
}
for (int i = 0; i < listPrenom.size(); i++) {
for (int j = 0; j < listPays.size(); j++) {
String toWrite = listPrenom.get(i) + " " + listPays.get(j);
System.out.println("=====================WRITE=======================");
System.out.println(toWrite);
context.write(key, new Text(toWrite));
}
}
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "jointure");
job.setJarByClass(Jointure.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(IntWritable.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
Have you got any idea? Thanks for your time.
EDIT :
Here are the complete trace of the log when I launch the program :
2019-03-14 20:05:03,049 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties
2019-03-14 20:05:03,116 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s).
2019-03-14 20:05:03,116 INFO impl.MetricsSystemImpl: JobTracker metrics system started
2019-03-14 20:05:03,475 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2019-03-14 20:05:03,542 INFO input.FileInputFormat: Total input files to process : 1
2019-03-14 20:05:03,564 INFO mapreduce.JobSubmitter: number of splits:1
2019-03-14 20:05:03,674 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1184033728_0001
2019-03-14 20:05:03,675 INFO mapreduce.JobSubmitter: Executing with tokens: []
2019-03-14 20:05:03,803 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
2019-03-14 20:05:03,803 INFO mapreduce.Job: Running job: job_local1184033728_0001
2019-03-14 20:05:03,804 INFO mapred.LocalJobRunner: OutputCommitter set in config null
2019-03-14 20:05:03,808 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-03-14 20:05:03,808 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-03-14 20:05:03,809 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
2019-03-14 20:05:03,845 INFO mapred.LocalJobRunner: Starting task: attempt_local1184033728_0001_m_000000_0
2019-03-14 20:05:03,848 INFO mapred.LocalJobRunner: Waiting for map tasks
2019-03-14 20:05:03,867 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-03-14 20:05:03,867 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-03-14 20:05:03,918 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2019-03-14 20:05:03,934 INFO mapred.MapTask: Processing split: file:/media/mathis/OS/Cours/Semestre4/Cloud-Internet-objet/Hadoop-MapReduce/inputTab/file-tab:0+56
2019-03-14 20:05:04,046 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
2019-03-14 20:05:04,046 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
2019-03-14 20:05:04,046 INFO mapred.MapTask: soft limit at 83886080
2019-03-14 20:05:04,046 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
2019-03-14 20:05:04,046 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
2019-03-14 20:05:04,049 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2019-03-14 20:05:04,059 INFO mapred.LocalJobRunner:
2019-03-14 20:05:04,059 INFO mapred.MapTask: Starting flush of map output
2019-03-14 20:05:04,059 INFO mapred.MapTask: Spilling map output
2019-03-14 20:05:04,059 INFO mapred.MapTask: bufstart = 0; bufend = 110; bufvoid = 104857600
2019-03-14 20:05:04,059 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214376(104857504); length = 21/6553600
=====================WRITE=======================
Pierre Allemagne
=====================WRITE=======================
Pierre France
=====================WRITE=======================
Jacques France
2019-03-14 20:05:04,184 INFO mapred.MapTask: Finished spill 0
2019-03-14 20:05:04,234 INFO mapred.Task: Task:attempt_local1184033728_0001_m_000000_0 is done. And is in the process of committing
2019-03-14 20:05:04,237 INFO mapred.LocalJobRunner: map
2019-03-14 20:05:04,238 INFO mapred.Task: Task 'attempt_local1184033728_0001_m_000000_0' done.
2019-03-14 20:05:04,250 INFO mapred.Task: Final Counters for attempt_local1184033728_0001_m_000000_0: Counters: 18
File System Counters
FILE: Number of bytes read=4319
FILE: Number of bytes written=502994
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=7
Map output records=6
Map output bytes=110
Map output materialized bytes=70
Input split bytes=158
Combine input records=6
Combine output records=3
Spilled Records=3
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=0
Total committed heap usage (bytes)=212860928
File Input Format Counters
Bytes Read=56
2019-03-14 20:05:04,251 INFO mapred.LocalJobRunner: Finishing task: attempt_local1184033728_0001_m_000000_0
2019-03-14 20:05:04,252 INFO mapred.LocalJobRunner: map task executor complete.
2019-03-14 20:05:04,256 INFO mapred.LocalJobRunner: Waiting for reduce tasks
2019-03-14 20:05:04,256 INFO mapred.LocalJobRunner: Starting task: attempt_local1184033728_0001_r_000000_0
2019-03-14 20:05:04,269 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 2
2019-03-14 20:05:04,269 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false
2019-03-14 20:05:04,270 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
2019-03-14 20:05:04,274 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle#721f3077
2019-03-14 20:05:04,276 WARN impl.MetricsSystemImpl: JobTracker metrics system already initialized!
2019-03-14 20:05:04,300 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=625370688, maxSingleShuffleLimit=156342672, mergeThreshold=412744672, ioSortFactor=10, memToMemMergeOutputsThreshold=10
2019-03-14 20:05:04,301 INFO reduce.EventFetcher: attempt_local1184033728_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
2019-03-14 20:05:04,321 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local1184033728_0001_m_000000_0 decomp: 66 len: 70 to MEMORY
2019-03-14 20:05:04,325 INFO reduce.InMemoryMapOutput: Read 66 bytes from map-output for attempt_local1184033728_0001_m_000000_0
2019-03-14 20:05:04,326 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 66, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->66
2019-03-14 20:05:04,327 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
2019-03-14 20:05:04,327 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-03-14 20:05:04,327 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
2019-03-14 20:05:04,433 INFO mapred.Merger: Merging 1 sorted segments
2019-03-14 20:05:04,433 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 60 bytes
2019-03-14 20:05:04,436 INFO reduce.MergeManagerImpl: Merged 1 segments, 66 bytes to disk to satisfy reduce memory limit
2019-03-14 20:05:04,438 INFO reduce.MergeManagerImpl: Merging 1 files, 70 bytes from disk
2019-03-14 20:05:04,440 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
2019-03-14 20:05:04,440 INFO mapred.Merger: Merging 1 sorted segments
2019-03-14 20:05:04,443 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 60 bytes
2019-03-14 20:05:04,445 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-03-14 20:05:04,493 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
2019-03-14 20:05:04,498 INFO mapred.Task: Task:attempt_local1184033728_0001_r_000000_0 is done. And is in the process of committing
2019-03-14 20:05:04,504 INFO mapred.LocalJobRunner: 1 / 1 copied.
2019-03-14 20:05:04,505 INFO mapred.Task: Task attempt_local1184033728_0001_r_000000_0 is allowed to commit now
2019-03-14 20:05:04,541 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1184033728_0001_r_000000_0' to file:/media/mathis/OS/Cours/Semestre4/Cloud-Internet-objet/Hadoop-MapReduce/output
2019-03-14 20:05:04,542 INFO mapred.LocalJobRunner: reduce > reduce
2019-03-14 20:05:04,542 INFO mapred.Task: Task 'attempt_local1184033728_0001_r_000000_0' done.
2019-03-14 20:05:04,544 INFO mapred.Task: Final Counters for attempt_local1184033728_0001_r_000000_0: Counters: 24
File System Counters
FILE: Number of bytes read=4491
FILE: Number of bytes written=503072
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=70
Reduce input records=3
Reduce output records=0
Spilled Records=3
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=6
Total committed heap usage (bytes)=212860928
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Output Format Counters
Bytes Written=8
2019-03-14 20:05:04,544 INFO mapred.LocalJobRunner: Finishing task: attempt_local1184033728_0001_r_000000_0
2019-03-14 20:05:04,544 INFO mapred.LocalJobRunner: reduce task executor complete.
2019-03-14 20:05:04,807 INFO mapreduce.Job: Job job_local1184033728_0001 running in uber mode : false
2019-03-14 20:05:04,811 INFO mapreduce.Job: map 100% reduce 100%
2019-03-14 20:05:04,816 INFO mapreduce.Job: Job job_local1184033728_0001 completed successfully
2019-03-14 20:05:04,846 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=8810
FILE: Number of bytes written=1006066
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
Map-Reduce Framework
Map input records=7
Map output records=6
Map output bytes=110
Map output materialized bytes=70
Input split bytes=158
Combine input records=6
Combine output records=3
Reduce input groups=2
Reduce shuffle bytes=70
Reduce input records=3
Reduce output records=0
Spilled Records=6
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=6
Total committed heap usage (bytes)=425721856
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=56
File Output Format Counters
Bytes Written=8
I am trying my hands on MapReduce program in Hadoop 2.6 using JAVA code. I tried to refer to other posts on Stack Overflow but failed to debug my code.
First let me describe the type of records :
subId=00001111911128052627towerid=11232w34532543456345623453456984756894756bytes=122112212212212218.4621702216543667E17
subId=00001111911128052639towerid=11232w34532543456345623453456984756894756bytes=122112212212212219.6726312167218586E17
subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212216.9431647633139046E17
subId=00001111911128052615towerid=11232w34532543456345623453456984756894756bytes=122112212212212214.7836041833447418E17
Now the Mapper Class: AircelMapper.class
import java.io.IOException;
import java.lang.String;
import java.lang.Long;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.io.*;
public class AircelMapper extends Mapper<LongWritable,Text,Text, LongWritable>
{
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException
{
String acquire=value.toString();
String st=acquire.substring(81, 84);
LongWritable bytes=new LongWritable(Long.parseLong(st));
context.write(new Text(acquire.substring(6, 26)), bytes);
}
}
Now the Driver Class: AircelDriver.class
import java.io.IOException;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
public class AircelDriver
{
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException
{
if(args.length<2)
{ System.out.println(" type ip and op file correctly");
System.exit(-1);
}
Job job = Job.getInstance();
job.setJobName(" ############### MY FIRST PROGRAM ###############");
job.setJarByClass(AircelDriver.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(LongWritable.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapperClass(AircelMapper.class);
job.setReducerClass(AircelReducer.class);
job.submit();
job.waitForCompletion(true);
}
}
I am not posting the Reducer class since the problem is in mapper code during runtime. The output of the Hadoop runtime is as follows (which is essentially an indication of job failure):
16/12/18 04:11:00 INFO mapred.LocalJobRunner: Starting task: attempt_local1618565735_0001_m_000000_0
16/12/18 04:11:01 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1
16/12/18 04:11:01 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
16/12/18 04:11:01 INFO mapred.MapTask: Processing split: hdfs://quickstart.cloudera:8020/practice/Data_File.txt:0+1198702
16/12/18 04:11:01 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
16/12/18 04:11:01 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
16/12/18 04:11:01 INFO mapred.MapTask: soft limit at 83886080
16/12/18 04:11:01 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
16/12/18 04:11:01 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
16/12/18 04:11:01 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
16/12/18 04:11:01 INFO mapreduce.Job: Job job_local1618565735_0001 running in uber mode : false
16/12/18 04:11:01 INFO mapreduce.Job: map 0% reduce 0%
16/12/18 04:11:02 INFO mapred.MapTask: Starting flush of map output
16/12/18 04:11:02 INFO mapred.MapTask: Spilling map output
16/12/18 04:11:02 INFO mapred.MapTask: bufstart = 0; bufend = 290000; bufvoid = 104857600
16/12/18 04:11:02 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26174400(104697600); length = 39997/6553600
16/12/18 04:11:03 INFO mapred.MapTask: Finished spill 0
16/12/18 04:11:03 INFO mapred.LocalJobRunner: map task executor complete.
16/12/18 04:11:03 WARN mapred.LocalJobRunner: job_local1618565735_0001
****java.lang.Exception: **java.lang.StringIndexOutOfBoundsException: String index out of range: 84******
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 84
at java.lang.String.substring(String.java:1907)
at AircelMapper.map(AircelMapper.java:13)
at AircelMapper.map(AircelMapper.java:1)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(Fut
Why it is giving String Index out of bounds exception? Does String class have internally a limit on the size of the string? I do not understand what is problem on line 13-15 in the Mapper class.
IndexOutOfBoundsException - if the beginIndex is negative, or endIndex is larger than the length of this String object, or beginIndex is larger than endIndex.
public StringIndexOutOfBoundsException(int index)
Constructs a new StringIndexOutOfBoundsException class with an argument indicating the illegal index. - 84 (in your case)
public StringIndexOutOfBoundsException(String s)
Constructs a StringIndexOutOfBoundsException with the specified detail message. - array out of range (in your case)
Check your input at index 84.
i want to create map reduce job of my own.
the map class's output is :Text(key),Text(value)
the reduce class's output is :Text,Intwritable
I tried to implement it in following way:
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;
public class artistandTrack {
public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, Text> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text, Text> output, Reporter reporter) throws IOException {
String line = value.toString();
String[] names=line.split(" ");
Text artist_name = new Text(names[2]);
Text track_name = new Text(names[3]);
output.collect(artist_name,track_name);
}
}
public static class Reduce extends MapReduceBase implements Reducer<Text, Text, Text, IntWritable> {
public void reduce(Text key, Iterator<Text> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += 1;
Text x1=values.next();
}
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(artistandTrack.class);
conf.setJobName("artisttrack");
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
JobClient.runJob(conf);
}
}
when i try to run it it shows the following output and terminates
WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
14/10/17 06:09:15 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
14/10/17 06:09:15 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/10/17 06:09:16 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
14/10/17 06:09:18 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
14/10/17 06:09:19 INFO mapred.FileInputFormat: Total input paths to process : 1
14/10/17 06:09:19 INFO mapreduce.JobSubmitter: number of splits:1
14/10/17 06:09:19 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local803195645_0001
14/10/17 06:09:20 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/staging/userloki803195645/.staging/job_local803195645_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/10/17 06:09:20 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/staging/userloki803195645/.staging/job_local803195645_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
14/10/17 06:09:20 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/local/localRunner/userloki/job_local803195645_0001/job_local803195645_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/10/17 06:09:20 WARN conf.Configuration: file:/app/hadoop/tmp/mapred/local/localRunner/userloki/job_local803195645_0001/job_local803195645_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
14/10/17 06:09:20 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/10/17 06:09:20 INFO mapreduce.Job: Running job: job_local803195645_0001
14/10/17 06:09:20 INFO mapred.LocalJobRunner: OutputCommitter set in config null
14/10/17 06:09:20 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
14/10/17 06:09:20 INFO mapred.LocalJobRunner: Waiting for map tasks
14/10/17 06:09:20 INFO mapred.LocalJobRunner: Starting task: attempt_local803195645_0001_m_000000_0
14/10/17 06:09:20 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
14/10/17 06:09:20 INFO mapred.MapTask: Processing split: hdfs://localhost:54310/project5/input/sad.txt:0+272
14/10/17 06:09:21 INFO mapred.MapTask: numReduceTasks: 1
14/10/17 06:09:21 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/10/17 06:09:21 INFO mapreduce.Job: Job job_local803195645_0001 running in uber mode : false
14/10/17 06:09:21 INFO mapreduce.Job: map 0% reduce 0%
14/10/17 06:09:22 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
14/10/17 06:09:22 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
14/10/17 06:09:22 INFO mapred.MapTask: soft limit at 83886080
14/10/17 06:09:22 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
14/10/17 06:09:22 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
14/10/17 06:09:25 INFO mapred.LocalJobRunner:
14/10/17 06:09:25 INFO mapred.MapTask: Starting flush of map output
14/10/17 06:09:25 INFO mapred.MapTask: Spilling map output
14/10/17 06:09:25 INFO mapred.MapTask: bufstart = 0; bufend = 120; bufvoid = 104857600
14/10/17 06:09:25 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214384(104857536); length = 13/6553600
14/10/17 06:09:25 INFO mapred.LocalJobRunner: map task executor complete.
14/10/17 06:09:25 WARN mapred.LocalJobRunner: job_local803195645_0001
***java.lang.Exception: java.io.IOException: wrong value class: class org.apache.hadoop.io.IntWritable is not class org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.io.IOException: wrong value class: class org.apache.hadoop.io.IntWritable is not class org.apache.hadoop.io.Text
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:199)
at org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:1307)
at artistandTrack$Reduce.reduce(artistandTrack.java:44)
at artistandTrack$Reduce.reduce(artistandTrack.java:37)
at org.apache.hadoop.mapred.Task$OldCombinerRunner.combine(Task.java:1572)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:1611)
at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:1462)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:437)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)***
14/10/17 06:09:26 INFO mapreduce.Job: Job job_local803195645_0001 failed with state FAILED due to: NA
14/10/17 06:09:26 INFO mapreduce.Job: Counters: 11
Map-Reduce Framework
Map input records=4
Map output records=4
Map output bytes=120
Map output materialized bytes=0
Input split bytes=97
Combine input records=0
Combine output records=0
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
File Input Format Counters
Bytes Read=272
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at artistandTrack.main(artistandTrack.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
from where the wrong class is coming
java.lang.Exception: java.io.IOException: wrong value class: class org.apache.hadoop.io.IntWritable is not class org.apache.hadoop.io.Text
and
why the job fails
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:836)
at artistandTrack.main(artistandTrack.java:68)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:483)
at org.apache.hadoop.util.RunJar.main(RunJar.java:212)
i dont understand where it's going wrong.
any help
I think the problem is in line:
conf.setCombinerClass(Reduce.class);
Your Map produces a pair of Text and Text, then your combiner takes it and produces a pair of Text and IntWritable, but your Reducer can't accept IntWritable as value so it throws an exception. Try to remove line with Combiner setting.
Use Main from bellow:
public static void main(String[] args) throws Exception {
JobConf conf = new JobConf(artistandTrack.class);
conf.setJobName("artisttrack");
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
conf.setMapperClass(Map.class);
//conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
//conf.setOutputKeyClass(Text.class);
//conf.setOutputValueClass(IntWritable.class);
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
JobClient.runJob(conf);
}
I have developed a Map reduce application to determine the first and last time a user com‐ mented and the total number of comments from that user based on book written by Donald Miner.
But the problem with my algorithm is the reducer. I have grouped the comments based on user Id. My test data contains two userid each posting 3 comments on different dates. hence a total of 6 rows.
So my reducer output should print two records each showing first and last time a user commented and total comments for each userid.
But, my reducer is printing six records. Can some one point out whats wrong with the following code ?
import java.io.IOException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Map;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
import org.arjun.mapreduce.patterns.mapreducepatterns.MRDPUtils;
import com.sun.el.parser.ParseException;
public class MinMaxCount {
public static class MinMaxCountMapper extends
Mapper<Object, Text, Text, MinMaxCountTuple> {
private Text outuserId = new Text();
private MinMaxCountTuple outTuple = new MinMaxCountTuple();
private final static SimpleDateFormat sdf =
new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSSS");
#Override
protected void map(Object key, Text value,
org.apache.hadoop.mapreduce.Mapper.Context context)
throws IOException, InterruptedException {
Map<String, String> parsed =
MRDPUtils.transformXMLtoMap(value.toString());
String date = parsed.get("CreationDate");
String userId = parsed.get("UserId");
try {
Date creationDate = sdf.parse(date);
outTuple.setMin(creationDate);
outTuple.setMax(creationDate);
} catch (java.text.ParseException e) {
System.err.println("Unable to parse Date in XML");
System.exit(3);
}
outTuple.setCount(1);
outuserId.set(userId);
context.write(outuserId, outTuple);
}
}
public static class MinMaxCountReducer extends
Reducer<Text, MinMaxCountTuple, Text, MinMaxCountTuple> {
private MinMaxCountTuple result = new MinMaxCountTuple();
protected void reduce(Text userId, Iterable<MinMaxCountTuple> values,
org.apache.hadoop.mapreduce.Reducer.Context context)
throws IOException, InterruptedException {
result.setMin(null);
result.setMax(null);
result.setCount(0);
int sum = 0;
int count = 0;
for(MinMaxCountTuple tuple: values )
{
if(result.getMin() == null ||
tuple.getMin().compareTo(result.getMin()) < 0)
{
result.setMin(tuple.getMin());
}
if(result.getMax() == null ||
tuple.getMax().compareTo(result.getMax()) > 0) {
result.setMax(tuple.getMax());
}
System.err.println(count++);
sum += tuple.getCount();
}
result.setCount(sum);
context.write(userId, result);
}
}
/**
* #param args
*/
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
String [] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if(otherArgs.length < 2 )
{
System.err.println("Usage MinMaxCout input output");
System.exit(2);
}
Job job = new Job(conf, "Summarization min max count");
job.setJarByClass(MinMaxCount.class);
job.setMapperClass(MinMaxCountMapper.class);
//job.setCombinerClass(MinMaxCountReducer.class);
job.setReducerClass(MinMaxCountReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(MinMaxCountTuple.class);
FileInputFormat.setInputPaths(job, new Path(otherArgs[0]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
boolean result = job.waitForCompletion(true);
if(result)
{
System.exit(0);
}else {
System.exit(1);
}
}
}
Input:
<row Id="8189677" PostId="6881722" Text="Have you looked at Hadoop?" CreationDate="2011-07-30T07:29:33.343" UserId="831878" />
<row Id="8189677" PostId="6881722" Text="Have you looked at Hadoop?" CreationDate="2011-08-01T07:29:33.343" UserId="831878" />
<row Id="8189677" PostId="6881722" Text="Have you looked at Hadoop?" CreationDate="2011-08-02T07:29:33.343" UserId="831878" />
<row Id="8189678" PostId="6881722" Text="Have you looked at Hadoop?" CreationDate="2011-06-30T07:29:33.343" UserId="931878" />
<row Id="8189678" PostId="6881722" Text="Have you looked at Hadoop?" CreationDate="2011-07-01T07:29:33.343" UserId="931878" />
<row Id="8189678" PostId="6881722" Text="Have you looked at Hadoop?" CreationDate="2011-08-02T07:29:33.343" UserId="931878" />
output file contents part-r-00000:
831878 2011-07-30T07:29:33.343 2011-07-30T07:29:33.343 1
831878 2011-08-01T07:29:33.343 2011-08-01T07:29:33.343 1
831878 2011-08-02T07:29:33.343 2011-08-02T07:29:33.343 1
931878 2011-06-30T07:29:33.343 2011-06-30T07:29:33.343 1
931878 2011-07-01T07:29:33.343 2011-07-01T07:29:33.343 1
931878 2011-08-02T07:29:33.343 2011-08-02T07:29:33.343 1
job submission output:
12/12/16 11:13:52 INFO input.FileInputFormat: Total input paths to process : 1
12/12/16 11:13:52 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
12/12/16 11:13:52 WARN snappy.LoadSnappy: Snappy native library not loaded
12/12/16 11:13:52 INFO mapred.JobClient: Running job: job_201212161107_0001
12/12/16 11:13:53 INFO mapred.JobClient: map 0% reduce 0%
12/12/16 11:14:06 INFO mapred.JobClient: map 100% reduce 0%
12/12/16 11:14:18 INFO mapred.JobClient: map 100% reduce 100%
12/12/16 11:14:23 INFO mapred.JobClient: Job complete: job_201212161107_0001
12/12/16 11:14:23 INFO mapred.JobClient: Counters: 26
12/12/16 11:14:23 INFO mapred.JobClient: Job Counters
12/12/16 11:14:23 INFO mapred.JobClient: Launched reduce tasks=1
12/12/16 11:14:23 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=12264
12/12/16 11:14:23 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/12/16 11:14:23 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/12/16 11:14:23 INFO mapred.JobClient: Launched map tasks=1
12/12/16 11:14:23 INFO mapred.JobClient: Data-local map tasks=1
12/12/16 11:14:23 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=10124
12/12/16 11:14:23 INFO mapred.JobClient: File Output Format Counters
12/12/16 11:14:23 INFO mapred.JobClient: Bytes Written=342
12/12/16 11:14:23 INFO mapred.JobClient: FileSystemCounters
12/12/16 11:14:23 INFO mapred.JobClient: FILE_BYTES_READ=204
12/12/16 11:14:23 INFO mapred.JobClient: HDFS_BYTES_READ=888
12/12/16 11:14:23 INFO mapred.JobClient: FILE_BYTES_WRITTEN=43479
12/12/16 11:14:23 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=342
12/12/16 11:14:23 INFO mapred.JobClient: File Input Format Counters
12/12/16 11:14:23 INFO mapred.JobClient: Bytes Read=761
12/12/16 11:14:23 INFO mapred.JobClient: Map-Reduce Framework
12/12/16 11:14:23 INFO mapred.JobClient: Map output materialized bytes=204
12/12/16 11:14:23 INFO mapred.JobClient: Map input records=6
12/12/16 11:14:23 INFO mapred.JobClient: Reduce shuffle bytes=0
12/12/16 11:14:23 INFO mapred.JobClient: Spilled Records=12
12/12/16 11:14:23 INFO mapred.JobClient: Map output bytes=186
12/12/16 11:14:23 INFO mapred.JobClient: Total committed heap usage (bytes)=269619200
12/12/16 11:14:23 INFO mapred.JobClient: Combine input records=0
12/12/16 11:14:23 INFO mapred.JobClient: SPLIT_RAW_BYTES=127
12/12/16 11:14:23 INFO mapred.JobClient: Reduce input records=6
12/12/16 11:14:23 INFO mapred.JobClient: Reduce input groups=2
12/12/16 11:14:23 INFO mapred.JobClient: Combine output records=0
12/12/16 11:14:23 INFO mapred.JobClient: Reduce output records=6
12/12/16 11:14:23 INFO mapred.JobClient: Map output records=6
Ah caught the culprit, just change your reduce method's signature to following:
protected void reduce(Text userId, Iterable<MinMaxCountTuple> values,
Context context)
throws IOException, InterruptedException {
Basically you just need to have Context and not org.apache.hadoop.mapreduce.Reducer.Context
Now the output looks like :
831878 2011-07-30T07:29:33.343 2011-08-02T07:29:33.343 3
931878 2011-06-30T07:29:33.343 2011-08-02T07:29:33.343 3
I tested it locally for you, and this change did the trick. Though it is an odd behavior and if anyone would shed light on this it would be great. It has something to do with generics though. As when org.apache.hadoop.mapreduce.Reducer.Context is used it says that :
"Reducer.Context is a raw type. References to generic type Reducer<KEYIN,VALUEIN,KEYOUT,VALUEOUT>.Context should be parameterized"
But when only 'Context' is used it's alright.