Word count function that counts words in a txt file

Word count function that counts words in a txt file - java

I'm very new to java here so please bear with me.
I'm currently trying to create code that does the following:
Add code to your processFile function that counts the number of times each word appears in the file.
Add code to your processFile function that loops through your HashMap to find the most frequent word. After your loop, the variable added for bonus requirement #1 should contain the value for the most frequent word.
So far I've come up with this and was wondering if anyone could please help me progress further.
Map<String, Integer> freq = new Hashmap<String, Integer>();
FileInputStream fi = new FileInputStream("readwords,txt");
Scanner input = new Scanner(fi);
while (input.hasNext()) {
String word = input.next().toLowerCase();
Integer f = freq.get(word);
if (f == null) {
freq.put(word,1);
}
else {
freq.put(word,f+1);
}
}
Thank you

Your syntax is close, but you've mixed String declaration styles, your generic type is missing a > and your variable names are inconsistent. I think you wanted something like,
Map<String, Integer> map = new HashMap<>();
File file = new File("readwords.txt");
try (Scanner input = new Scanner(file)) {
while (input.hasNext()) {
String word = input.next().toLowerCase();
Integer f = map.get(word);
if (f == null) {
map.put(word, 1);
} else {
map.put(word, f + 1);
}
}
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
}

For counting the words and for getting most frequently used word you can try this:
public void processFile() throws Exception {
Map<String, Integer> freq = new HashMap<>();
FileInputStream fi = new FileInputStream("readwords.txt");
String mostFreqWord = null;
Integer highestFreq = 0;
Scanner input = new Scanner(fi);
while (input.hasNext()) {
String word = input.next().toLowerCase();
Integer f = freq.get(word) == null ? 1 : freq.get(word) + 1;
freq.put(word, f);
if(f > highestFreq) {
mostFreqWord = word; // set most frequent word
highestFreq = f; // frequency of most frequent word
}
}
System.out.println("Word :" + mostFreqWord
+ " is the most frequent word with frequency:" + highestFreq);
}
Since I have modified the code you already posted, here is the explanation of modification that I did (I assume that you already know what your original code was doing).
Inside loop, below line checks if word word has encountered first time in loop, if yes then sets it's frequency as 1 otherwise it increments frequency for that word.
Integer f = freq.get(word) == null ? 1 : freq.get(word) + 1;
Then it sets latest frequency for the word: freq.put(word, f);
Statement if(f > highestFreq) checks if the highest frequency is still highest, if not then updates highestFreq and mostFreqWord words.

Related

Finding Most Frequent Element(s) In A File Of Integers

I am working on a program to find the most frequent element(s) in a text file. Thus far I have made the file read into a List then iterate through the list to find the occurrences of every value and map them in a SortedMap.
The issue is occurring with files where every digit occurs equally. My Map is not filling with all the data and will only contain one of the digits at the end.
Here is my code:
public class FileAnalyzer {
public static void main(String[] args) throws IOException, FileNotFoundException {
System.out.print("Please Enter A File Name: ");
String file = new Scanner(System.in).nextLine();
final long startTime = System.currentTimeMillis();
BufferedReader reader = new BufferedReader(new FileReader(file));
List<Integer> numbers = new ArrayList<>();
SortedMap<Integer, Integer> sortedMap = new TreeMap<>();
String line;
while ((line = reader.readLine()) != null) {
numbers.add(Integer.parseInt(line));
}
Collections.sort(numbers);
int frequency = 0;
int tempNum = 0;
for (int i = 0; i < numbers.size(); i++) {
if (tempNum == numbers.get(i)) {
frequency++;
} else {
if (frequency != 0) {
sortedMap.put((frequency+1), tempNum);
}
frequency = 0;
tempNum = numbers.get(i);
}
}
if (frequency !=0) {
sortedMap.put((frequency+1), tempNum);
}
final long duration = System.currentTimeMillis() - startTime;
System.out.println(sortedMap);
System.out.println("Runtime: " + duration + " ms\n");
System.out.println("Least Frequent Digit(s): " + sortedMap.get(sortedMap.firstKey()) + "\nOccurences: " + sortedMap.firstKey());
}
}
Also this is the text file I am running into issues when reading from:
1
2
1
1
2
1
1
2
1
2
2
2
Thanks in advance!

You should look up the Java Documentation for TreeMap. It is designed to not store duplicate keys, so since you are sorting on frequency as a key, values with the same frequency will be overwritten in your map!

Finding Unique Words In A Text File Using ArrayList

I'm working on a project where I enter a URL, the file is read and the amount of lines, characters, and words are outputted in a text file. I'm not having an issue with that. Code below will be pretty long, sorry in advance.
I also have to output to the same text file all of the words in the file, and the amount of times each word is displayed in the file. I've been working on it for a while and I've gotten to the point where all the lines/characters/words are outputted to the text file, but I can't figure out how to display the actual words and the amount of times they are in the file.
String[] wordSubstrings = line.replaceAll("\\s+", " ").split(" ");
List<String> uniqueWords = new ArrayList<String>();
for (int i = 0; i < wordSubstrings.length; i++) {
if (!(uniqueWords.contains(wordSubstrings[i]))) {
uniqueWords.add(wordSubstrings[i]);

You could use a Multiset
Multiset<String> words = HashMultiset.create();
for (String word : wordList)
words.add(word);
for (String word : words.elementSet())
System.out.println(word + ": " + words.count(word));

I've tested something with a HashMap which seems to work pretty well.
Here is my code that I used to test it, I hope it helps:
String[] wordSubstrings = new String[]{"test","stuff","test","thing","test","test","stuff"};
HashMap<String,Integer> uniqueWords = new HashMap<>();
for ( int i = 0; i < wordSubstrings.length; i++)
{
if(!(uniqueWords.containsKey(wordSubstrings[i])))
{
uniqueWords.put(wordSubstrings[i], 1);
}
else
{
int number = uniqueWords.get(wordSubstrings[i]);
uniqueWords.put(wordSubstrings[i],number + 1);
}
}
for (Map.Entry<String, Integer> entry : uniqueWords.entrySet()) {
String key = entry.getKey();
int value = entry.getValue();
//Do Something with the key and value
}

You can use arraylist of class which will contain word and count as member variables.
List <MyClass> uniqueWords = new ArrayList<MyClass> ();
MyClass()
{
String uniqueword;
int count;
}

Issue iterating through two arraylists

EDIT: Thanks so much for all the really quick feedback. Wow. I did just paste it all for you instead of just those two for loops. Thanks.
This may have been totally answered before. I have read SO for the last few years but this is my first post. I have been using the site and others to help solve this so my apologies in advance if this has been answered!
I am iterating through two arraylists. One is derived from user input; the other is a dictionary file converted into an arraylist. I am trying to compare a word in the input with a dictionary word. The input list and the dictionary list are valid and if I simply iterate through them, they contain what they should (so that isn't the issue. I assume my issue is somewhere with how I am handling the iteration. I'm a fairly novice Java programmer so please go easy on me.
Thanks
public String isSub(String x) throws FileNotFoundException, IOException {
//todo handle X
String out = "**********\nFor input \n" + x + "If you're reading this no match was found.\n**********";
String dictionary;
boolean solve = true;
/// Get dictionary
dictMaker newDict = new dictMaker();
dictionary = newDict.arrayMaker();
List<String> myDict = new ArrayList<String>(Arrays.asList(dictionary.split(",")));
List<String> input = new ArrayList<String>(Arrays.asList(x.split(" ")));
List<String> results = new ArrayList<String>();
//results = input;
String currentWord;
String match = "";
String checker = "";
String fail="";
//Everything to break sub needs to happen here.
while (solve) {
for(int n = 0; n < input.size(); n++) { //outside FOR (INPUT)
if(!fail.equals("")) results.add(fail);
checker = input.get(n).trim();
for(int i = 0; i < myDict.size(); i++) { //inside FOR (dictionary)
currentWord = myDict.get(i).trim();
System.out.print(checker + " " + currentWord + "\n");
if(checker.equals(currentWord)) {
match = currentWord;
results.add(currentWord);
fail="";
} //end if
else {
fail = "No match for " + checker;
}
}//end inside FOR (dictionary)
} //END OUTSIDE FOR (input)
solve=false;
} //end while
out = results.toString();
return out;
}
Output results for input "test tester asdasdfasdlfk"
[test, No match for test, tester, No match for tester]

Carl Manaster gave the correct explanation.
Here's an improved version of your code:
for (int n = 0; n < input.size(); n++) { //outside FOR (INPUT)
String checker = input.get(n).trim();
boolean match = false;
for (int i = 0; i < myDict.size(); i++) { //inside FOR (dictionary)
String currentWord = myDict.get(i).trim();
System.out.print(checker + " " + currentWord + "\n");
if (checker.equals(currentWord)) {
match = true;
results.add(currentWord);
break;
} //end if
} //end inside FOR (dictionary)
if (!match) {
results.add("No match for " + checker);
}
} //END OUTSIDE FOR (input)
Also, consider using a HashMap instead of an ArrayList to store the dictionary and trim the words when you store them to avoid doing it in each pass.

It looks as though every word in input gets compared to every word in your dictionary. So for every word that doesn't match, you get a fail (although you only write the last failure in the dictionary to the results). The problem appears to be that you keep looping even after you have found the word. To avoid this, you probably want to add break to the success case:
if (checker.equals(currentWord)) {
match = currentWord;
results.add(currentWord);
fail = "";
break;
} else {
fail = "No match for " + checker;
}

If you are using a dictionary, you should get it with keys not with index. So it should be
if(myDict.containsKey(checker)){
String currentWord =myDict.get(checker);
System.out.print(checker + " " + currentWord + "\n");
match = currentWord;
results.add(currentWord);
fail = "";
}
else {
fail = "No match for " + checker;
}
I think more or less your code should like following.
ArrayList<String> input= new ArrayList<String>();
input.add("ahmet");
input.add("mehmet");
ArrayList<String> results= new ArrayList<String>();
Map<String, String> myDict = new HashMap<String, String>();
myDict.put("key", "ahmet");
myDict.put("key2", "mehmet");
String match="";
String fail="";
for (int n = 0; n < input.size(); n++) { //outside FOR (INPUT)
if (!fail.equals(""))
results.add(fail);
String checker = input.get(n).trim();
for (int i = 0; i < myDict.size(); i++) { //inside FOR (dictionary)
// String currentWord = myDict.get(i).trim();
if(myDict.containsKey(checker)){
String currentWord =myDict.get(checker);
System.out.print(checker + " " + currentWord + "\n");
match = currentWord;
results.add(currentWord);
fail = "";
}
else {
fail = "No match for " + checker;
}
} // end inside FOR (dictionary)
} // end outside FOR (input)
// solve = false; I dont know what is this
//} //end while. no while in my code
return results.toString();

You should place the dictionary to a HashSet and trim while add all words. Next you just need to loop the input list and compare with dict.conatins(inputWord). This saves the possible huge dictionary loop processed for all input words.
Untested brain dump:
HashSet<String> dictionary = readDictionaryFiles(...);
List<String> input = getInput();
for (String inputString : input)
{
if (dictionary.contains(inputString.trim()))
{
result.add(inputString);
}
}
out = result.toString()
....
And a solution similar to the original posting. The unnecessary loop index variables are removed:
for (String checker : input)
{ // outside FOR (INPUT)
fail = "No match for " + checker;
for (String currentWord : myDict)
{ // inside FOR (dictionary)
System.out.print(checker + " " + currentWord + "\n");
if (checker.equals(currentWord))
{
match = currentWord;
results.add(currentWord);
fail = null;
break;
}
} // end inside FOR (dictionary)
if (fail != null)
{
results.add(fail);
}
} // end outside FOR (input)
solve = false;
return results.toString();
The trim should be made while add the elements to the list. Trim the dictionary values each time is overhead. And the inner loop itself too. The complexity of the task can be reduced if the dictionary data structure is changed from List to Set.
Adding the result of "fail" is moved to the end of the outer loop. Otherwise the result of the last input string is not added to the result list.
The following code is terrible:
else {
fail = "No match for " + checker;
}
The checker does not change within the dictionary loop. But the fail string is constructed each time the checker and the dictionary value does not match.

How to find on which line a word is in Java

I am trying to create a program that counts the number of times a word appears in a text and also tell you how many times it appears on each line. I have managed to find the number of times the word appears and the number of lines in the text, but I cannot find on which line the word appears in and how many times. Could you please help me? This is my code so far:
FileReader file = new FileReader("C:/Users/User/Desktop/test.txt");
BufferedReader buffer = new BufferedReader(file);
String line = buffer.readLine();
Map<String, Integer> hash = new HashMap<String, Integer>();
int counter = 0; //number of lines
while (line != null){
String[] words = line.split(" ");
for (String s : words) {
Integer i = hash.get(s);
hash.put(s, (i==null)? 1: i+1);
}
line = buffer.readLine();
counter = counter + 1;
}
System.out.println(hash);
System.out.println(counter);

It is additional information to each row. You just need an information of count on each line, therefore simple Map is not enough, you need Map of Map at each row.
There are two basic ways :
Map<Integer, Map<String, Integer>> hashOfHash = new HashMap<>();
List<Map<String, Integer>> list = new ArrayList<>();
First line creates Map of your Map based on integer key value - which would be the line.
Second line is creating list of your Maps, because the order in list is stored, you can now which line is which just by iterating through it.
I would recommend second line.
You need also modify your while cycle a bit to be able to create new map for each line (think about it that you need to do the same as it does at first line).
For example this should do the same as your program, but it will show results for each row :
public static void main(String[] args) throws FileNotFoundException, IOException {
FileReader file = new FileReader("C:/Users/User/Desktop/test.txt");
BufferedReader buffer = new BufferedReader(file);
String line = buffer.readLine();
List<Map<String, Integer>> list = new ArrayList<>();
while (line != null) {
Map<String, Integer> hash = new HashMap<String, Integer>();
String[] words = line.split(" ");
for (String s : words) {
Integer i = hash.get(s);
hash.put(s, (i == null) ? 1 : i + 1);
}
line = buffer.readLine();
list.add(hash);
}
int i=0;
for (Map<String, Integer> mapAtRow : list) {
i++;
System.out.println("at row " + i + "we found this: " + mapAtRow);
}
}

Here is a recursive method that will allow you, using String.indexOf to count how many times a word appears in a line.
You have read the line from your bufferedReader
String line = buffer.readLine();
then in your loop you have
for (String s : words) {
int numberOfOccurencesOfS = countNumberOfTimesInALine(line,s);
}
the countNumberOfTimesInALinereceives the original line and the word your are counting as arguments. To use it you should also declare a class variable like this:
private static int numberOfLineOccurences;
Here is the method
public static int countNumberOfTimesInALine(String line, String word) {
if (line.indexOf(word) == -1) {
return numberOfLineOccurences;
} else {
numberOfLineOccurences++;
if (line.indexOf(word) + word.length() > line.length() -1 ) {
return numberOfLineOccurences;
}
return countNumberOfTimesInALine(
line.substring(line.indexOf(word) + word.length()), word );
}
}
Here is a usage example:
String line = "DEMO TEST DEMO TEST DEMO TEST ALPHA BETA GAMMA";
System.out.println("Number of occurences of TEST is " + countNumberOfTimesInALine(line, "TEST"));
Here is the result
Number of occurences of TEST is 3
I have published an answer to a similar question as yours here

Counting the letters (uppercase and lowercase) of a string

I have here a program that enters a paragraph and writes it into a file. After that, it should count the occurrences of each letters (case sensitive). However, it doesn't count the number of letter occurrences. I think I put the for loop in the wrong place.
import java.io.*;
import java.util.*;
public class Exercise1 {
public static int countLetters (String line, char alphabet) {
int count = 0;
for (int i = 0; i <= line.length()-1; i++) {
if (line.charAt(i) == alphabet)
count++;
}
return count;
}
public static void main(String[] args) throws IOException {
BufferedReader buffer = new BufferedReader (new InputStreamReader(System.in));
PrintWriter outputStream = null;
Scanner input = new Scanner (System.in);
int total;
try {
outputStream = new PrintWriter (new FileOutputStream ("par.txt"));
System.out.println("How many lines are there in the paragraph you'll enter?");
int lines = input.nextInt();
System.out.println("Enter the paragraph: ");
String paragraph = buffer.readLine();
outputStream.println(paragraph);
int j;
for (j = 1; j<lines; j++) {
paragraph = buffer.readLine();
outputStream.println(paragraph);
}
outputStream.close();
System.out.println("The paragraph is written to par.txt");
for (int k=1; k<lines; k++) {
paragraph = buffer.readLine();
total = countLetters (paragraph, 'A');
if (total != 0)
System.out.println("A: "+total);
//I'll do bruteforce here up to lowercase z
}
}
catch(FileNotFoundException e) {
System.out.println("Error opening the file par.txt");
}
}
}
Please help me fix the code. I'm new in programming and I need help. Thank you very much!

First, your initial reading user input is a bit of a waste since you read once then enter the for loop for the rest - this is not a problem, just a better code.
// your code
String paragraph = buffer.readLine();
outputStream.println(paragraph);
int j;
for (j = 1; j<lines; j++) {
paragraph = buffer.readLine();
outputStream.println(paragraph);
}
You can just put them in the loop:
// better code
String paragraph;
int j;
for (j = 0; j<lines; j++) {
paragraph = buffer.readLine();
outputStream.println(paragraph);
}
Then your first problem comes from the way you read the lines:
// your code - not working
outputStream.close();
for (int k=1; k<lines; k++) {
paragraph = buffer.readLine();
total = countLetters (paragraph, 'A');
Consider what happened above:
The input is already DONE, the output is already written and stream is closed - up to here everything is good
Then when you try to count the number of characters, you do: paragraph = buffer.readLine(); - what does this code do? It waits for another user input (instead of reading what's been inserted)
To fix the problem above: you need to read from what's already been written - not asking for another input. Then instead of brute forcing every character one by one, you can just put them into a list and write a for loop.
So now, you want to read from the existing file that you already created (ie. reading what WAS inputted by the user):
BufferedReader fileReader = new BufferedReader(new FileReader(new File("par.txt")));
String allCharacters = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ";
String aLineInFile;
// Read the file that was written earlier (whose content comes from user input)
// This while loop will go through line-by-line in the file
while((aLineInFile = fileReader.readLine()) != null)
{
// For every line in the file, count number of occurrences of characters
// This loop goes through every character (a-z and A-Z)
for(int i = 0; i < allCharacters.length(); i++)
{
// For each single character, check the number of occurrences in the current line
String charToLookAt = String.valueOf(allCharacters.charAt(i));
int numOfCharOccurancesInLine = countLetters (aLineInFile, charToLookAt);
System.out.println("For line: " + aLineInFile + ", Character: " + charToLookAt + " appears: " + numOfCharOccurancesInLine + " times " );
}
}
The above gives you the number of occurrences of every character in every line - now you just need to organize them to keep track of how many are in total for the whole file.
Code-wise, there might be better way to write this to have cleaner implementation, but the above is easy to understand (and I just wrote it very quickly).

Do everything in one loop:
for (j = 1; j<lines; j++) {
paragraph = buffer.readLine();
total = countLetters (paragraph, 'A');
if (total != 0)
System.out.println("A: "+total);
outputStream.println(paragraph);
}

You can use a HashTable for count each case sentitive letters :
final Pattern patt = Pattern.compile("A-Za-z]");
final HashMap<Character, Integer> tabChar = new HashMap<Character, Integer>(
52);
// replace : paragraph = buffer.readLine();
// Unless you use it outside, you can declare it 'final'
final char[] paragraph = "azera :;,\nApOUIQSaOOOF".toCharArray();
for (final Character c : paragraph ) {
if (Character.isLetter(c)) {
Integer tot = tabChar.get(c);
tabChar.put(c, (null == tot) ? 1 : ++tot);
}
}
Output :
{F=1, A=1, O=4, I=1, U=1, Q=1, S=1, e=1, a=3, r=1, p=1, z=1}
You can use final TreeSet<Character> ts = new TreeSet(tabChar.keySet()); to sort the characters and then get(c); them from tabChar

The previous answers would have solved your problem but another way of avoiding brute force might be to use a loop using ASCII character value.

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Word count function that counts words in a txt file - java

Related

Finding Most Frequent Element(s) In A File Of Integers

Finding Unique Words In A Text File Using ArrayList

Issue iterating through two arraylists

How to find on which line a word is in Java

Counting the letters (uppercase and lowercase) of a string

Categories

Resources