I have a file that I need to use to execute the wordcount function(based on MapReduce) but using threads, I take the file and split it into multiple small files then I loop the small files to count the number of occurrences of words with a Reduce() function, how can I implement threads withe the run() function to use them with the Reduce function.
here's my code:
public class WordCounter implements Runnable {
private String Nom;
protected static int Chunks = 1 ;
public WordCounter (String n) {
Nom = n;
}
public void split () throws IOException
{
File source = new File(this.Nom);
int maxRows = 100;
int i = 1;
try(Scanner sc = new Scanner(source)){
String line = null;
int lineNum = 1;
File splitFile = new File(this.Nom+i+".txt");
FileWriter myWriter = new FileWriter(splitFile);
while (sc.hasNextLine()) {
line = sc.nextLine();
if(lineNum > maxRows){
Chunks++;
myWriter.close();
lineNum = 1;
i++;
splitFile = new File(this.Nom+i+".txt");
myWriter = new FileWriter(splitFile);
}
myWriter.write(line+"\n");
lineNum++;
}
myWriter.close();
}
}
public void Reduce() throws IOException
{
ArrayList<String> words = new ArrayList<String>();
ArrayList<Integer> count = new ArrayList<Integer>();
for (int i = 1; i < Chunks; i++) {
//create the input stream (recevoir le texte)
FileInputStream fin = new FileInputStream(this.getNom()+i+".txt");
//go through the text with a scanner
Scanner sc = new Scanner(fin);
while (sc.hasNext()) {
//Get the next word
String nextString = sc.next();
//Determine if the string exists in words
if (words.contains(nextString)) {
int index = words.indexOf(nextString);
count.set(index, count.get(index)+1);
}
else {
words.add(nextString);
count.add(1);
}
}
sc.close();
fin.close();
}
// Creating a File object that represents the disk file.
FileWriter myWriter = new FileWriter(new File(this.getNom()+"Result.txt"));
for (int i = 0; i < words.size(); i++) {
myWriter.write(words.get(i)+ " : " +count.get(i) +"\n");
}
myWriter.close();
//delete the small files
deleteFiles();
}
public void deleteFiles()
{
File f= new File("");
for (int i = 1; i <= Chunks; i++) {
f = new File(this.getNom()+i+".txt");
f.delete();
}
}
}
Better use Callable instead of using Runnable interface and this way you can retrieve your data.
So in order to fix your code you can more or less do something like this:
public class WordCounter {
private static ExecutorService threadPool = Executors.newFixedThreadPool(5); // 5 represents the number of concurrent threads.
public Map<String, Integer> count(String filename) {
int chunks = splitFileInChunks(filename);
List<Future<Report>> reports = new ArrayList<Future<Report>>();
for (int i=1; i<=chunks; i++) {
Callable<Report> callable = new ReduceCallable(filename + i + ".txt");
Future<Report> future = threadPool.submit(callable);
reports.add(future);
}
Map<String, Integer> finalMap = new HashMap<>();
for (Future<Report> future : reports) {
Map<String, Integer> map = future.get().getWords();
for (Map.Entry<String, Integer> entry : map.entrySet()) {
int oldCnt = finalMap.get(entry.getKey()) != null ? finalMap.get(entry.getKey()) : 0;
finalMap.put(entry.getKey(), entry.getValue() + oldCnt);
}
}
// return a map with the key being the word and the value the counter for that word
return finalMap;
}
// this method doesn't need to be run on the separate thread
private int splitFileInChunks(String filename) throws IOException { .... }
}
public class Report {
Map<String, Integer> words = new HashMap<>();
// ... getter, setter, constructor etc
}
public class ReduceCounter implements Callable<Report> {
String filename;
public ReduceCounter(String filename) { this.filename = filename;}
public Report call() {
// store the values in a Map<String, Integer> since it's easier that way
Map<String, Integer> myWordsMap = new HashMap<String, Integer>;
// here add the logic from your Reduce method, without the for loop iteration
// you should add logic to read only the file named with the value from "filename"
return new Report(myWordsMap);
}
}
Please note you can skip the Report class and return Future<Map<String,Integer>>, but I used Report to make it more easy to follow.
Update for Runnable as requested by user
public class WordCounter {
public Map<String, Integer> count(String filename) throws InterruptedException {
int chunks = splitFileInChunks(filename);
List<ReduceCounter> counters = new ArrayList<>();
List<Thread> reducerThreads = new ArrayList<>();
for (int i=1; i<=chunks; i++) {
ReduceCounter rc = new ReduceCounter(filename + i + ".txt");
Thread t = new Thread(rc);
counters.add(rc);
reducerThreads.add(t);
t.start();
}
// next wait for the threads to finish processing
for (Thread t : reducerThreads) {
t.join();
}
// now grab the results from each of them
for (ReduceCounter cnt : counters ) {
cnt.getWords();
// next just merge the results here...
}
}
Reducer class should look like:
public class ReduceCounter implements Runnable {
String filename;
Map<String, Integer> words = new HashMap();
public ReduceCounter(String filename) { this.filename = filename;}
public void run() {
// store the values in the "words" map
// here add the logic from your Reduce method, without the for loop iteration
// also read, only the file named with the value from "filename"
}
public Map<String, Integer> getWords() {return words;}
}
I kind of found a solution as i assign a thread to each small file, then i call the Reduce() function inside the run() function, but i still don't fully have my head around it, here's the code:
public void Reduce() throws IOException
{
ArrayList<String> words = new ArrayList<String>();
ArrayList<Integer> count = new ArrayList<Integer>();
Thread TT= new Thread();
for (int i = 1; i < Chunks; i++) {
//create the input stream (recevoir le texte)
FileInputStream fin = new FileInputStream(this.getNom()+i+".txt");
TT=new Thread(this.getNom()+i+".txt");
TT.start();
//go through the text with a scanner
Scanner sc = new Scanner(fin);
while (sc.hasNext()) {
//Get the next word
String nextString = sc.next();
//Determine if the string exists in words
if (words.contains(nextString)) {
int index = words.indexOf(nextString);
count.set(index, count.get(index)+1);
}
else {
words.add(nextString);
count.add(1);
}
}
sc.close();
fin.close();
}
// Creating a File object that represents the disk file.
FileWriter myWriter = new FileWriter(new File(this.getNom()+"Result.txt"));
for (int i = 0; i < words.size(); i++) {
myWriter.write(words.get(i)+ " : " +count.get(i) +"\n");
}
myWriter.close();
//Store the result in the new file
deleteFiles();
}
public void run() {
try {
this.Reduce();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void main(String[] args) throws IOException {
Wordcounter w1 = new Wordcounter("Words.txt");
Thread T1= new Thread(w1);
T1.start();
}
Related
I am trying to save each index of list as new line in text file in format below in java and read same as two separate Array List for later . I have done the saving part now i want to read it back into two separate lists
Save Formate
Class SaveLists
public class SaveLists{
private List<Integer> First= new ArrayList<>();
private List<Integer> second= new ArrayList<>();
public void save(List l) throws IOException{
try{
File f = new File ("E:\\Sample.txt");
if (!f.exists()) {
f.createNewFile();
}
FileWriter fw = new FileWriter(f.getAbsoluteFile(),true);
BufferedWriter bw = new BufferedWriter(fw);
for(Object s : l) {
bw.write(s + System.getProperty("line.separator"));
}
bw.write(System.getProperty("line.separator"));
bw.close();
}catch(FileNotFoundException e){System.out.println("error");}
}
public void ReadFromText(){
//Read from Text file and save in both lists
}
}
Class Main :
public static void main(String[] args) {
temp t = new temp();
t.First.add(1);
t.First.add(2);
t.First.add(3);
t.second.add(6);
t.second.add(5);
t.second.add(4);
t.save(t.First);
t.save(t.second);
// t.ReadFromText();
}
As both the save operations are on the same thread the lines would be written to the file in a synchronous manner. So after reading all the lines from the file we can split the lines based on the size of the input lists i.e. the first set of values would have been inserted by the 'First' list.
public void ReadFromText() {
// Read from Text file and save in both lists
List<Integer> output1 = new ArrayList<>();
List<Integer> output2 = new ArrayList<>();
try {
Path path = Path.of("D:\\Sample.txt");
List<String> inputList = Files.lines(path).collect(Collectors.toList());
for (int i = 0; i < inputList.size(); i++) {
String s = inputList.get(i);
if (!s.isEmpty()) {
if (i < First.size()) {
output1.add(Integer.valueOf(s));
} else {
output2.add(Integer.valueOf(s));
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
}
How can I sort a cvs file by one field in Java?
For example I want to sort it by the third field
I have a cvs file that looks like this:
1951,Jones,5
1984,Smith,7
...
I tried using Scanner as such, with a delimiter but I couldn't figure out how to go on:
public static void main(String[] args)
{
//String data = args[0];
Scanner s = null;
String delim = ";";
try
{
s = new Scanner(new BufferedReader (new FileReader("test.csv")));
List<Integer> three = new ArrayList<Integer>();
while(s.hasNext())
{
System.out.println(s.next());
s.useDelimiter(delim);
}
}
catch(FileNotFoundException e)
{
System.out.println("File not found");
}
finally
{
if(s != null)
{
s.close();
}
}
}
Thank you!
public static void main(String[] args)
{
final String DELIM = ";";
final int COLUMN_TO_SORT = 2; //First column = 0; Third column = 2.
List<List<String>> records = new ArrayList<>();
try (Scanner scanner = new Scanner(new File("test.csv"))) {
while (scanner.hasNextLine()) {
records.add(getRecordFromLine(scanner.nextLine(), DELIM));
}
}
catch(FileNotFoundException e){
System.out.println("File not found");
}
Collections.sort(records, new Comparator<List<String>>(){
#Override
public int compare(List<String> row1, List<String> row2){
if(row1.size() > COLUMN_TO_SORT && row2.size() > COLUMN_TO_SORT)
return row1.get(COLUMN_TO_SORT).compareTo(row2.get(COLUMN_TO_SORT));
return 0;
}
});
for (Iterator<List<String>> iterator = records.iterator(); iterator.hasNext(); ) {
System.out.println(iterator.next());
}
}
private static List<String> getRecordFromLine(String row, String delimiter) {
List<String> values = new ArrayList<String>();
try (Scanner rowScanner = new Scanner(row)) {
rowScanner.useDelimiter(delimiter);
while (rowScanner.hasNext()) {
values.add(rowScanner.next());
}
}
return values;
}
** Note that the example file is separated by comma, but in the code you use semicolon as the delimiter.
I have to read a csv file which has a specific number of fields.I must traverse and detect the consecutive strings of the first column (i have used an array to read the file) and only for these strings, i want to get the sum of their int values in the third column of the file,which i have stored into an another array. So far, i am able to do the detection of the consecutive same strings, but how can i grab their values and get their sum for each string? Is it possible to do this with simultaneous traversal? I don't have experience in java, please help. Thanks.
Here's my code.
The csv file is something like this with random values:
ip, timestamp,elapsed,..
127.0.0.1,...,1500
127.0.0.2,...,2800
127.0.0.2,...,2400
127.0.0.2,...,2500
127.0.0.3,...,1700
127.0.0.4,...,1600
127.0.0.4,...,1500
127.0.0.5,...,2000
I must get something like this: 127.0.0.2:7700, 127.0.0.4:3100
public static void main(String[] args) {
try {
System.out.println("Give file's name: ");
Scanner in = new Scanner(System.in);
String filename = in.nextLine();
File file = new File(filename);
Scanner inputfile = new Scanner(file);
String csv_data[];
ArrayList<String> ip_list = new ArrayList<String>();
ArrayList<String> elapsed_list = new ArrayList<String>();
String[] ip_array = new String[ip_list.size()];
String[] elapsed_array = new String[elapsed_list.size()];
int i = 0;
int j = 0;
int sum = 0;
while (inputfile.hasNextLine()) {
String line = inputfile.nextLine();
csv_data = line.split(",");
ip_list.add(csv_data[0]);
elapsed_list.add(csv_data[2]);
}
ip_array = ip_list.toArray(ip_array);
elapsed_array = elapsed_list.toArray(elapsed_array);
for (String element : elapsed_array) {
try {
int num = Integer.parseInt(element);
} catch (NumberFormatException fe) {
fe.printStackTrace();
System.out.println(" That's not a number");
}
}
while (i < ip_array.length) {
int start = i;
while (i < ip_array.length && (ip_array[i].equals(ip_array[start]))) {
i++;
}
int count = i - start;
if (count >= 5) {
System.out.println(ip_array[start] + " " + "|" + " " + count);
}
}
} catch (FileNotFoundException ex) {
ex.printStackTrace();
}
}
public static void main(String[] args) throws Exception {
List<Data> data = read(getFile());
Map<String, Integer> idSum = groupByIdWithSum(data);
// ...
}
private static File getFile() throws Exception {
try (Scanner scan = new Scanner(System.in)) {
System.out.println("Give file's name: ");
return new File(in.nextLine());
}
}
private static List<Data> read(File file) throws Exception {
try (Scanner scan = new Scanner(file)) {
List<Data> res = new ArrayList<>();
while(scan.hasNextLine()){
String[] lineParts = scan.nextLine().splie(",");
res.add(new Data(lineParts[0], Integer.parseInt(lineParts[2])));
}
return res;
}
}
private static Map<String, Integer> groupByIdWithSum(List<Data> data) {
Map<String, Integer> map = new HashMap<>();
for(Data d : data)
map.put(d.getId(), map.getOrDefault(d.getId(), 0) + d.getElapsed());
return map;
}
final static class Data {
private final String ip;
private final int elapsed;
public Data(String ip, int elapsed) {
this.ip = ip;
this.elapsed = elapsed;
}
public String getId() {
return id;
}
public int getElapsed() {
return elapsed;
}
}
I am trying to take an initial CSV file, pass it through a class that checks another file if it has an A or a D to then adds or deletes the associative entry to an array object.
example of pokemon.csv:
1, Bulbasaur
2, Ivysaur
3, venasaur
example of changeList.csv:
A, Charizard
A, Suirtle
D, 2
That being said, I am having a lot of trouble getting the content of my new array to a new CSV file. I have checked to see whether or not my array and class files are working properly. I have been trying and failing to take the final contents of "pokedex1" object array into the new CSV file.
Main File
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.Scanner;
public class PokedexManager {
public static void printArray(String[] array) {
System.out.print("Contents of array: ");
for(int i = 0; i < array.length; i++) {
if(i == array.length - 1) {
System.out.print(array[i]);
}else {
System.out.print(array[i] + ",");
}
}
System.out.println();
}
public static void main(String[] args) {
try {
//output for pokedex1 using PokemonNoGaps class
PokemonNoGaps pokedex1 = new PokemonNoGaps();
//initializes scanner to read from csv file
String pokedexFilename = "pokedex.csv";
File pokedexFile = new File(pokedexFilename);
Scanner pokescanner = new Scanner(pokedexFile);
//reads csv file, parses it into an array, and then adds new pokemon objects to Pokemon class
while(pokescanner.hasNextLine()) {
String pokeLine = pokescanner.nextLine();
String[] pokemonStringArray = pokeLine.split(", ");
int id = Integer.parseInt(pokemonStringArray[0]);
String name = pokemonStringArray[1];
Pokemon apokemon = new Pokemon(id, name);
pokedex1.add(apokemon);
}
//opens changeList.csv file to add or delete entries from Pokemon class
String changeListfilename = "changeList.csv";
File changeListFile = new File(changeListfilename);
Scanner changeScanner = new Scanner(changeListFile);
//loads text from csv file to be parsed to PokemonNoGaps class
while(changeScanner.hasNextLine()) {
String changeLine = changeScanner.nextLine();
String[] changeStringArray = changeLine.split(", ");
String action = changeStringArray[0];
String nameOrId = changeStringArray[1];
//if changList.csv file line has an "A" in the first spot add this entry to somePokemon
if(action.equals("A")) {
int newId = pokedex1.getNewId();
String name = nameOrId;
Pokemon somePokemon = new Pokemon(newId, name);
pokedex1.add(somePokemon);
}
//if it has a "D" then send it to PokemonNoGaps class to delete the entry from the array
else { //"D"
int someId = Integer.parseInt(nameOrId);
pokedex1.deleteById(someId);
}
//tests the action being taken and the update to the array
//System.out.println(action + "\t" + nameOrId + "\n");
System.out.println(pokedex1);
//*(supposedly)* prints the resulting contents of the array to a new csv file
String[] pokemonList = changeStringArray;
try {
String outputFile1 = "pokedex1.csv";
FileWriter writer1 = new FileWriter(outputFile1);
writer1.write(String.valueOf(pokemonList));
} catch (IOException e) {
System.out.println("\nError writing to Pokedex1.csv!");
e.printStackTrace();
}
}
//tests final contents of array after being passed through PokemonNoGaps class
//System.out.println(pokedex1);
} catch (FileNotFoundException e) {
e.printStackTrace();
}
}
}
PokemonNoGaps class file:
public class PokemonNoGaps implements ChangePokedex {
private Pokemon[] pokedex = new Pokemon[1];
private int numElements = 0;
private static int id = 0;
// add, delete, search
#Override
public void add(Pokemon apokemon) {
// if you have space
this.pokedex[this.numElements] = apokemon;
this.numElements++;
// if you don't have space
if(this.numElements == pokedex.length) {
Pokemon[] newPokedex = new Pokemon[ this.numElements * 2]; // create new array
for(int i = 0; i < pokedex.length; i++) { // transfer all elements from array into bigger array
newPokedex[i] = pokedex[i];
}
this.pokedex = newPokedex;
}
this.id++;
}
public int getNewId() {
return this.id + 1;
}
#Override
public void deleteById(int id) {
for(int i = 0; i < numElements; i++) {
if(pokedex[i].getId() == id) {
for(int j = i+1; j < pokedex.length; j++) {
pokedex[j-1] = pokedex[j];
}
numElements--;
pokedex[numElements] = null;
}
}
}
public Pokemon getFirstElement() {
return pokedex[0];
}
public int getNumElements() {
return numElements;
}
public String toString() {
String result = "";
for(int i = 0; i < this.numElements; i++) {
result += this.pokedex[i].toString() + "\n";
}
return result;
}
}
Excpeted output:
1, Bulbasaur
3, Venasaur
4, Charizard
5, Squirtle
Am i using the wrong file writer? Am I calling the file writer at the wrong time or incorrectly? In other words, I do not know why my output file is empty and not being loaded with the contents of my array. Can anybody help me out?
I spotted a few issues whilst running this. As mentioned in previous answer you want to set file append to true in the section of code that writes to the new pokedx1.csv
try {
String outputFile1 = "pokedex1.csv";
FileWriter fileWriter = new FileWriter(prefix+outputFile1, true);
BufferedWriter bw = new BufferedWriter(fileWriter);
for(String pokemon : pokedex1.toString().split("\n")) {
System.out.println(pokemon);
bw.write(pokemon);
}
bw.flush();
bw.close();
} catch (IOException e) {
System.out.println("\nError writing to Pokedex1.csv!");
e.printStackTrace();
}
I opted to use buffered reader for the solution. Another issue I found is that your reading pokedex.csv but the file is named pokemon.csv.
String pokedexFilename = "pokemon.csv";
I made the above change to fix this issue.
On a side note I noticed that you create several scanners to read the two files. With these types of resources its good practice to call the close method once you have finished using them; as shown below.
Scanner pokescanner = new Scanner(pokedexFile);
// Use scanner code here
// Once finished with scanner
pokescanner.close();
String outputFile1 = "pokedex1.csv";
FileWriter writer1 = new FileWriter(outputFile1);
appears to be within your while loop so a new file will be created every time.
Either use the FileWriter(File file, boolean append) constructor or create before the loop
I am trying to segregate my data into multiple array list, so that I can use them later-on in my code. But I am not able to put my data in array list.
My code is about segregating the data into three array list of different Subjects (Example:Physics,chemistry) as per various filters, which you will find in my code.
Input data file:
1|150|20150328|20150406|Physics|1600|1600|2|68|92
2|152|20150328|20150406|Physics|1600|1500|2|68|89
3|153|20150328|20150406|Physics|1600|1500|2|68|60
4|155|20150328|20150406|Physics|1600|1600|2|68|72
5|161|20150328|20150406|Chemistry|1600|1600|2|68|77
Here's my code:
Public Class filter{
public static void main(String args[])
BufferedReader in= null;
BufferedWriter out= null;
String in_line;
String PrevRollNo= "";
int PrevDate= 0;
ArrayList<transaction> PhysicsList= new ArrayList<transaction>();
ArrayList<transaction> scList= new ArrayList<transaction>();
ArrayList<transaction> Chemistry= new ArrayList<transaction>();
try{
in = new BufferedReader(new FileReader(Path for input file));
File out_file= new File(Path for output file);
if(!out_file.exists())
{
(!out_file.createNewFile();
}
FileWriter fw=new FileWriter(out_file);
out= new BufferedWriter(fw);
while ((in_line=in.readline())!=null)
{
Transaction transact=new Transaction(in_line);
if(transact.RollNo.equals(PrevRollNo))
{
if(transact.subject.equals("Physics")&& transact.Prs_Date= PrevDate
{
PhysicsList.add(transact);
}
else if(transact.subject.equals("Physics")&&transact.wk_date != PrevDate}
Iterator<Transaction> it;
if(!transact.RoomNo.equals("102")&&!transact.lcl_RoomNo.equals("102");
{
it= scList.iterator();
while(it.hasnext())
{
Transaction sc= it.next();
if(sc.lcl_RoomNo.equals(transact.RoomNo) && sc.l1 equals(tansact.l1) && sc.l2 equals(transact.l2)
if(sc.marks==transact.marks)
{
transact.srsfound= true;
}
else
{
System.out.print.ln( "not found");
}
scList.remove(sc))
out.write(in_line);
break;
}}}}
Static Class Transaction
{
Public String RollNo, Subject, RoomNo, lcl_RoomNo, l1, l2;
Public int wk_date, prs_date;
Public double marks , amt;
Public boolean srcfound, tgtfound;
Public Transaction(String in_line)
{
String [] SplitData= in_line.split("\\|");
RollNo = SplitData[1];
Subject = SplitData[4]
RoomNo = SplitData[5];
lcl_RoomNo = SplitData[6];
l1 = SplitData[7];
l2 = SplitData[8];
wk_date = SplitData[3];
prs_date = SplitData[2];
marks = Double.parsedouble(SplitData[9]);
amt = Double.parsedouble(SplitData[]);
srcfound = false;
tgtfound = false;
}
Kindly help with your expertise.
Use Java 8 NIO and Streams. It will ease the job.
Files.lines(Paths.get("fileName.txt")).map(line -> {
String[] tokens = line.split("|");
//tokens contains individual elements of each line. Add your grouping logic on tokens array
}
I agree with the other answer in some ways. NIO should be used, it makes it a lot easier. However, I would avoid streams and instead use the readAllLines method like so:
try{
List<String> filecontents = new String(Files.readAllLines(file.toPath()); //file is the object to read from.
for(int i = 0; i < filecontents.size(); i++){
String line = lines.get(i);
//New code starts here
if(!line.contains("|") continue; //Ignore that line
//New code ends here
String[] array = line.split("|");
ArrayList<String> list = new ArrayList<String>();
for(int a = 0; a < array.length; a++){
String part = array[a];
list.add(part);
}
Transaction t = new Transaction(line);
if(line.contains("Physics") PlysicsList.add(t);
else if(line.contains("Chemistry") Chemistry.add(t);
else{ //Do nothing}
}
}catch(IOException e){
e.printStackTrace();
}
EDIT: I added a check in there. The reason the first and last lines may not be working is if the lines that are being parsed are not being parsed properly. See if this fixes your issue