I try running a map reduce on some data on a cluster and get the following output.
0000000000 44137 0 2
1 1
902996760100000 44137 2 6
2 2
9029967602 44137 2 8
2 2
90299676030000 44137 2 1
9029967604 44137 2 5
2 2
905000 38704 2 1
9050000001 38702 2 24
2 2
9050000001 38704 2 14
2 2
9050000001 38705 2 12
2 2
9050000001 38706 2 13
2 2
9050000001 38714 2 24
2 2
9050000002 38704 2 12
2 2
9050000002 38706 2 12
2 2
9050000011 38704 2 6
2 2
9050000011 38706 2 12
2 2
9050000021 38702 2 12
2 2
9050000031 38704 2 6
2 2
9050000031 38705 2 6
2 2
9050000031 38714 2 12
2 2
This is my reducer
public class RTopLoc extends Reducer<CompositeKey, IntWritable, Text, Text> {
private static int number = 0;
private static CompositeKey lastCK;
private static Text lastLac = new Text();
#Override
public void reduce(CompositeKey key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = sumValues(values);
String str = Integer.toString(sum);
String str2 = Integer.toString(number);
String str3 = key.getSecond().toString();
context.write(key.getFirst(), new Text(str3 + " " + str2 + " " + str));
context.write(lastCK.getFirst(), lastCK.getSecond());
if(number == 0){
number = sum;
lastCK = new CompositeKey(key.getFirst().toString(), key.getSecond().toString());
context.write(new Text("1"), new Text("1"));
}
else if(lastCK.getFirst().equals(key.getFirst()) && sum > number){
lastCK = new CompositeKey(key.getFirst().toString(), key.getSecond().toString());
context.write(new Text("2"), new Text("2"));
}
else if(!lastCK.getFirst().equals(key.getFirst())){
// context.write(lastCK.getFirst(), lastCK.getSecond());
context.write(new Text("3"), new Text("3"));
number = sum;
lastCK = new CompositeKey(key.getFirst().toString(), key.getSecond().toString());
}
}
From what I understand the problem is that hadoop treats lastCK and key as the same object and this condition
if(lastCK.getFirst().equals(key.getFirst())
will always be true
This is my CompositeKey class
public class CompositeKey implements WritableComparable {
private Text first = null;
private Text second = null;
public CompositeKey() {
}
public CompositeKey(Text first, Text second) {
this.first = first;
this.second = second;
}
//...getters and setters
public Text getFirst() {
return first;
}
public void setFirst(Text first) {
this.first = first;
}
public void setFirst(String first) {
setFirst(new Text(first));
}
public Text getSecond() {
return second;
}
public void setSecond(Text second) {
this.second = second;
}
public void setSecond(String second) {
setSecond(new Text(second));
}
public void write(DataOutput d) throws IOException {
first.write(d);
second.write(d);
}
public void readFields(DataInput di) throws IOException {
if (first == null) {
first = new Text();
}
if (second == null) {
second = new Text();
}
first.readFields(di);
second.readFields(di);
}
public int compareTo(Object obj) {
CompositeKey other = (CompositeKey) obj;
int cmp = first.compareTo(other.getFirst());
if (cmp != 0) {
return cmp;
}
return second.compareTo(other.getSecond());
}
#Override
public boolean equals(Object obj) {
CompositeKey other = (CompositeKey)obj;
return first.equals(other.getFirst());
}
#Override
public int hashCode() {
return first.hashCode();
}
}
I tried changing setters to something along this lines
public void setFirst(Text first) {
this.first.set(first);
}
public void setFirst(String first) {
setFirst(new Text(first));
}
where set is Hadoop Text Documentation
but got
Error: java.lang.NullPointerException
at task.SecondarySort.CompositeKey.setFirst(CompositeKey.java:29)
at task.SecondarySort.CompositeKey.setFirst(CompositeKey.java:33)
How do I make hadoop treat lastCK and key as different objects?
If you change these lines:
private Text first = null;
private Text second = null;
to
private Text first = new Text();
private Text second = new Text();
And then use:
public void setFirst(Text first) {
this.first.set(first);
}
It should work. You could also create first and second in the constructors.
Related
i searched for answers from other topics but i really didn't understood much.
What i really want is: Let's say i have some data in an ArrayList to process and two threads, (or maybe 3?). How can i make these threads get data equally (and process it)?
e.g. : for an arraylist with 10 elements and 2 threads, 5 elements for each, for an arraylist with 10 elements and 3 threads, 3 elements for each and one with 4.
Extra question: Can i specifically say THAT ONE SPECIAL thread has to start?
This is what i get from running the following code:
Data to be processed in First processor: 0 1 2 3 4 5 6 7 8 9
Data to be processed in Second processor:
Data available:
(or random stuff)
What i actually want:
- Data to be processed in First processor: 0 2 4 6 8 ( or 0 1 2 3 4 )
- Data to be processed in Second processor:1 3 5 7 9 (or 5 6 7 8 9)
- Data available:
public class Data {
private List<Integer> dataIndex = new ArrayList<>();
Data() {
for (int i = 0; i < 10; i++) {
dataIndex.add(i);
}
}
public synchronized Integer extractOneData(){
return dataIndex.remove(0);
}
public List<Integer> getDataIndex() {
return dataIndex;
}
public void printDataIndex() {
System.out.println("Data available:");
for (Integer i : dataIndex) {
System.out.print(i + " ");
}
}
public class DataProcessor implements Runnable{
private Data data;
private String name;
private List<Integer> dataToProcess = new ArrayList<>();
DataProcessor(Data data,String name){
this.data=data;
this.name=name;
}
public void run(){
while(!data.getDataIndex().isEmpty()) {
dataToProcess.add(data.extractOneData());
}
}
public void displaydataToProcess(){
System.out.println("Data to be processed in " +name + ":");
for(Integer i:dataToProcess){
System.out.print(i+" ");
}
System.out.println();
}
}
public class Main {
public static void main(String[] args) {
Data d = new Data();
DataProcessor dp1 = new DataProcessor(d,"First processor");
DataProcessor dp2 = new DataProcessor(d,"Second processor");
Thread t1 = new Thread(dp1);
Thread t2 = new Thread(dp2);
t1.start();
t2.start();
dp1.displaydataToProcess();
dp2.displaydataToProcess();
d.printDataIndex();
}
For my own convenience I have made Data and DataProcessor as inner classes but you can separate them on local
public class ForcedMultithreading {
public static void main(String[] args) {
ForcedMultithreading f = new ForcedMultithreading();
Data d = f.new Data();
int numberOfThreads = 2;
int perThreadLoad = d.size() / numberOfThreads;
DataProcessor dp1 = f.new DataProcessor(d, "First processor", perThreadLoad);
DataProcessor dp2 = f.new DataProcessor(d, "Second processor", perThreadLoad);
Thread t1 = new Thread(dp1);
Thread t2 = new Thread(dp2);
t1.start();
t2.start();
dp1.displaydataToProcess();
dp2.displaydataToProcess();
d.printDataIndex();
}
class Data {
private List<Integer> dataIndex = new ArrayList<>();
Data() {
for (int i = 0; i < 10; i++) {
dataIndex.add(i);
}
}
public int size() {
return dataIndex.size();
}
public synchronized Integer extractOneData() {
return dataIndex.remove(0);
}
public List<Integer> getDataIndex() {
return dataIndex;
}
public void printDataIndex() {
System.out.println("Data available:");
for (Integer i : dataIndex) {
System.out.print(i + " ");
}
}
}
class DataProcessor implements Runnable {
private Data data;
private String name;
private int perThreadLoad;
private List<Integer> dataToProcess = new ArrayList<>();
DataProcessor(Data data, String name, int perThreadLoad) {
this.data = data;
this.name = name;
this.perThreadLoad = perThreadLoad;
}
#Override
public void run() {
while (perThreadLoad > 0) {
dataToProcess.add(data.extractOneData());
this.perThreadLoad--;
}
}
public void displaydataToProcess() {
System.out.println("Data to be processed in " + name + ":");
for (Integer i : dataToProcess) {
System.out.print(i + " ");
}
System.out.println();
}
}
}
Also in the above code, the size i.e 10 is being divided by 2 i.e. two threads but you can figure out the maths in details for 'size' not exactly divisible by 'numberOfThreads' for e.g. 10 / 3
Ok so I'm kind of in the loss here but here goes. So I need to sort the array medalList when they get printed out. First I need to sort by gold medals which are added to the index [0], second after silvers in index [1], third after bronze in index [2] and last if a team is tied they get sorted by team name. Do I need to call a sorting method in another class, keep track of one and sort through and compare to the rest of the teams and if they're the best print them out first?
How do I compare Integers in an array of one classes to another?
When a user enters a certain command a list of teams with their results will get printed out.
As of now it looks like this:
1st 2nd 3rd Team Name
0 0 0 North Korea
3 1 1 America
5 0 2 France
2 1 3 Germany
I want it to say:
1st 2nd 3rd Team Name
5 0 2 France
3 1 1 America
2 1 3 Germany
0 0 0 North Korea
import java.util.ArrayList;
import java.util.Arrays;
public class Team {
private String teamName;
private ArrayList<Participant> participantList = new ArrayList<Participant>();
private int[] medalList = new int[3];
public Team(String teamName) {
this.teamName = teamName;
}
public String getTeamName() {
return teamName;
}
public void addParticipant(Participant participant) {
participantList.add(participant);
}
public void removeFromTeam(int participantNr){
for(int i = 0; i < participantList.size(); i++){
if(participantList.get(i).getParticipantNr() == participantNr){
participantList.remove(i);
}
}
}
public void printOutParticipant() {
for(int i = 0; i < participantList.size(); i++){
System.out.println(participantList.get(i).getName() + " " + participantList.get(i).getLastName());
}
}
public boolean isEmpty() {
boolean empty = false;
if (participantList.size() == 0) {
empty = true;
return empty;
}
return empty;
}
public void emptyMedalList(){
Arrays.fill(medalList, 0);
}
public void recieveMedals(int medal) {
if(medal == 1){
int gold = 0;
gold = medalList[0];
medalList[0] = ++gold;
} else if (medal == 2){
int silver = 0;
silver = medalList[1];
medalList[1] = ++silver;
} else if (medal == 3){
int bronze = 0;
bronze = medalList[2];
medalList[2] = ++bronze;
}
}
public void printMedals(){
System.out.println(medalList[0] + " " + medalList[1] + " " + medalList[2] + " " + teamName);
}
public int compareTo(Team team) {
int goldDif = Integer.compare(team.medalList[0], this.medalList[0]);
if (goldDif != 0)
return goldDif;
int silverDif = Integer.compare(team.medalList[1], this.medalList[1]);
if (silverDif != 0)
return silverDif;
int bronzeDif = Integer.compare(team.medalList[2], this.medalList[2]);
if (bronzeDif != 0)
return bronzeDif;
return this.getTeamName().compareTo(team.getTeamName());
}
public String toString() {
return teamName;
}
}
Make your Team class comparable
public class Team implements Comparable<Team> {
and add a comparison method
#Override
public int compareTo(final Team other) {
for (int i = 0; i < 3; i++) {
final int compareMedals = Integer.compare(medalList[i], other.medalList[i])
if (compareMedals != 0) {
return compareMedals;
}
}
return teamName.compareTo(other.teamName);
}
This will check gold medals first, then silver medals if the amount of gold medals is equal and so on and use the team name comparison as a last resort. You can then sort a collection of Teams with
final List<Team> teams = new ArrayList<>();
...
Collections.sort(teams);
EDIT:
Or if you like it in Java 8 style you could also write your comparison method like
#Override
public int compareTo(final Team other) {
return Stream.of(0, 1, 2)
.map(i -> Integer.compare(medalList[i], other.medalList[i]))
.filter(i -> i != 0)
.findFirst()
.orElse(teamName.compareTo(other.teamName));
}
I have an input like:
Apple: 0 1
Apple: 4 5
Pear: 0 10
Pear: 11 13
Apple: 5 10
Apple: 2 4
And I'm looking for rows, where the fruits are the same and the first value equals to another row's second vale. So I'm looking for rows like: Apple: 4 5 Apple: 2 4 and I will also need Apple: 4 5 Apple: 5 10
On the otherhand, I don't want to search the whole data. I mean I don't want to search for Apple in Pears.
Should I use HashMap? or HashSet? or something else?
Thanks for replies.
Give this a try... It utilizes a HashMap of Lists
public static void main(String[] args) {
List<String> inputs = new ArrayList<>();
inputs.add("Apple: 0 1");
inputs.add("Apple: 4 5");
inputs.add("Pear: 0 10");
inputs.add("Pear: 11 13");
inputs.add("Apple: 5 10");
inputs.add("Apple: 2 4");
Map<String, List<Fruit>> fruits = new HashMap<>();
for (String input : inputs) {
String[] inputPieces = input.split(" ");
String name = inputPieces[0].replace(":", "");
int first = Integer.parseInt(inputPieces[1]);
int second = Integer.parseInt(inputPieces[2]);
if (!fruits.containsKey(name)) {
fruits.put(name, new ArrayList<Fruit>());
}
fruits.get(name).add(new Fruit(name, first, second));
}
for (String key : fruits.keySet()) {
System.out.println(key + ": " + findDuplicates(fruits.get(key)));
}
}
private static List<Fruit> findDuplicates(List<Fruit> fruits) {
List<Fruit> results = new ArrayList<>();
for (int i = 0; i < fruits.size(); i++) {
for (int j = 0; j < fruits.size(); j++) {
if (j == i) {
continue;
}
if ((fruits.get(i).first == fruits.get(j).second) ||
(fruits.get(j).first == fruits.get(i).second)) {
if (!results.contains(fruits.get(i))){
results.add(fruits.get(i));
}
}
}
}
return results;
}
public static class Fruit {
private String name;
private int first;
private int second;
public Fruit(String name, int first, int second) {
this.name = name;
this.first = first;
this.second = second;
}
#Override
public String toString() {
return String.format("%s: %d %d", name, first, second);
}
}
Results:
I have to make a program that reads each word a file and makes an index of which lines the word occurs on in alphabetical order.
for example, if the file had:
a white white dog
crowded around the valley
the output should be:
a
around: 2
crowded: 2
dog: 1
the: 2
valley: 1
white: 1, 1
When my file contains:
one fish two fish blue fish green fish
cow fish milk fish dog fish red fish
can you find a little lamb
can you find a white calf
THE OUTPUT IS WRONG!: (NOT IN ALPHA ORDER)
a: 3 4
calf: 4
find: 3 4 4
lamb: 3
little: 3
white: 4
you: 3 4
blue: 1
can: 3
cow: 2
dog: 2
green: 1 1 2 2 2 2
milk: 2
red: 2
two: 1 1 1
fish: 1
one: 1
Here is my code::
INDEXMAKER MASTER CLASS
import java.io.*;
import java.util.*;
public class IndexMaker {
private ArrayList<Word> words;
private String fileName;
private String writeFileName;
public IndexMaker(String fileName, String writeFileName) {
this.fileName = fileName;
this.writeFileName = writeFileName;
words = new ArrayList<Word>();
}
public void makeIndex() {
try {
File file = new File(fileName);
Scanner lineScanner = new Scanner(file);
int lineNum = 0;
while (lineScanner.hasNext()) {
lineNum++;
Scanner wordScanner = new Scanner(lineScanner.nextLine());
while (wordScanner.hasNext()) {
String word = wordScanner.next().toLowerCase();
if (!words.contains(new Word(word))) {
insertInto(word, findPosition(word), lineNum);
} else {
addLineNum(word, lineNum);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
public void displayIndex() {
try {
//FileWriter fileWriter = new FileWriter(new File(writeFileName));
//BufferedWriter writer = new BufferedWriter(fileWriter);
for (Word word : words)
System.out.println(word.getWord() + ": " + word.getLineNums());
} catch (Exception e) {
}
}
private int findPosition(String word) {
for (int i = 0; i < words.size(); i++) {
if (word.compareTo(words.get(i).getWord()) <= 0)
return i;
}
return 0;
}
private void insertInto(String word, int pos, int lineNum) {
words.add(pos, new Word(word, String.valueOf(lineNum)));
}
private void addLineNum(String word, int lineNum) {
int pos = findPosition(word);
words.get(pos).addLineNum(lineNum);
}
}
WORD CLASS
public class Word {
private String word;
private String lineNums;
public Word(String word, String lineNum) {
this.word = word;
this.lineNums = lineNum;
}
public Word(String word) {
this.word = word;
this.lineNums = "";
}
public String getWord() {
return word;
}
public String getLineNums() {
return lineNums;
}
public void addLineNum(int num) {
lineNums += " " + num;
}
#Override
public boolean equals(Object w) {
if (((Word)w).getWord().equals(word))
return true;
else
return false;
}
}
CLIENT
public class Client {
public static void main(String[] args) {
IndexMaker indexMaker = new IndexMaker("readme.txt", "readme.txt");
indexMaker.makeIndex();
indexMaker.displayIndex();
}
}
any help would be appreciated, thanks.
I can't find your definition of compareTo. It seems this would be the key part of your program?
Properly implement your compareTo and confirm it works properly by printing the results of comparisons using System.out.println
Doing your own comparison is "ok" in that it will work if you do it properly. The other thing you could do would be to implement Comparable and then you can get Java to sort a list of words for you.
I'm unable to print for the getLines method, am I doing something wrong here? It doesnt give me any errors when I run the program but when I try to print the getlines method, it gives me errors.
it gives me this erros when i try to print the getlines method.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1 at dijkstra.Fileprocess.getLines(Fileprocess.java:37) at dijkstra.Fileprocess.main(Fileprocess.java:70)
public class Fileprocess {
public static Scanner Reader(String FileName){
//Pass a File Name to this method, then will return Scanner for reading data from that file
try {
return new Scanner(new File(FileName));
} catch (FileNotFoundException ex) {
ex.printStackTrace();
System.exit(1);
return null;
}
}
static ArrayList<Edge> getLines(ArrayList<Vertex> PointCollection) {
Scanner Input = Reader(Vertex.graph);
ArrayList<Edge> result = new ArrayList<Edge>();
while(Input.hasNext()){
String line = Input.nextLine();
String[] arr = line.split(" ");
result.add(new Edge(Integer.parseInt(arr[0]), //index
getPointbyIndex(PointCollection,Integer.parseInt(arr[1])), //start
getPointbyIndex(PointCollection,Integer.parseInt(arr[2])), //end
Integer.parseInt(arr[3]))); //cost
}
Input.close();
return result;
}
static ArrayList<Vertex> getPoints() {
Scanner Input = Reader(Vertex.airports);
ArrayList<Vertex> result = new ArrayList<Vertex>();
while(Input.hasNext()){
String line = Input.nextLine();
result.add(new Vertex(line));
}
Input.close();
return result;
}
static Vertex getPointbyIndex(ArrayList<Vertex>PointCollection, int Index){
for(Vertex p:PointCollection){
if(p.getIndex() == Index){
return p;
}
}
return null;
}
public static void main(String[] args){
System.out.println(getPoints());
System.out.println(getLines(null));
}
}
this is the file for the input text file(index,start,end,cost)
1 1 2 2
2 1 3 1
3 1 6 3
4 1 7 3
5 2 1 2
6 2 3 1
7 2 4 1
8 2 5 2
9 2 6 2
10 3 1 1
11 3 2 1
12 3 4 1
class Edge {
public Vertex start;
public Vertex end;
public double cost;
public int Index;
// public final Vertex target;
// public final int weight;
public Edge(double cost, Vertex end, Vertex start, int Index){
this.start = start;
this.end = end;
this.cost = cost;
this.Index = Index;
}
public String toString(){
String result = "";
if(this.start != null && this.end != null){
result = this.Index +","+this.start.Index+","+this.end.Index +","+this.cost;
}
return result;
}
}
Maybe your file contains an empty or incompatible line?
you can fix the error using:
String[] arr = line.split(" ");
if (arr.length > 3) {
result.add(new Edge(Integer.parseInt(arr[0]), //index
getPointbyIndex(PointCollection,Integer.parseInt(arr[1])), //start
getPointbyIndex(PointCollection,Integer.parseInt(arr[2])), //end
Integer.parseInt(arr[3]))); //cost
}
}