I am trying to create a program that counts the number of times a word appears in a text and also tell you how many times it appears on each line. I have managed to find the number of times the word appears and the number of lines in the text, but I cannot find on which line the word appears in and how many times. Could you please help me? This is my code so far:
FileReader file = new FileReader("C:/Users/User/Desktop/test.txt");
BufferedReader buffer = new BufferedReader(file);
String line = buffer.readLine();
Map<String, Integer> hash = new HashMap<String, Integer>();
int counter = 0; //number of lines
while (line != null){
String[] words = line.split(" ");
for (String s : words) {
Integer i = hash.get(s);
hash.put(s, (i==null)? 1: i+1);
}
line = buffer.readLine();
counter = counter + 1;
}
System.out.println(hash);
System.out.println(counter);
It is additional information to each row. You just need an information of count on each line, therefore simple Map is not enough, you need Map of Map at each row.
There are two basic ways :
Map<Integer, Map<String, Integer>> hashOfHash = new HashMap<>();
List<Map<String, Integer>> list = new ArrayList<>();
First line creates Map of your Map based on integer key value - which would be the line.
Second line is creating list of your Maps, because the order in list is stored, you can now which line is which just by iterating through it.
I would recommend second line.
You need also modify your while cycle a bit to be able to create new map for each line (think about it that you need to do the same as it does at first line).
For example this should do the same as your program, but it will show results for each row :
public static void main(String[] args) throws FileNotFoundException, IOException {
FileReader file = new FileReader("C:/Users/User/Desktop/test.txt");
BufferedReader buffer = new BufferedReader(file);
String line = buffer.readLine();
List<Map<String, Integer>> list = new ArrayList<>();
while (line != null) {
Map<String, Integer> hash = new HashMap<String, Integer>();
String[] words = line.split(" ");
for (String s : words) {
Integer i = hash.get(s);
hash.put(s, (i == null) ? 1 : i + 1);
}
line = buffer.readLine();
list.add(hash);
}
int i=0;
for (Map<String, Integer> mapAtRow : list) {
i++;
System.out.println("at row " + i + "we found this: " + mapAtRow);
}
}
Here is a recursive method that will allow you, using String.indexOf to count how many times a word appears in a line.
You have read the line from your bufferedReader
String line = buffer.readLine();
then in your loop you have
for (String s : words) {
int numberOfOccurencesOfS = countNumberOfTimesInALine(line,s);
}
the countNumberOfTimesInALinereceives the original line and the word your are counting as arguments. To use it you should also declare a class variable like this:
private static int numberOfLineOccurences;
Here is the method
public static int countNumberOfTimesInALine(String line, String word) {
if (line.indexOf(word) == -1) {
return numberOfLineOccurences;
} else {
numberOfLineOccurences++;
if (line.indexOf(word) + word.length() > line.length() -1 ) {
return numberOfLineOccurences;
}
return countNumberOfTimesInALine(
line.substring(line.indexOf(word) + word.length()), word );
}
}
Here is a usage example:
String line = "DEMO TEST DEMO TEST DEMO TEST ALPHA BETA GAMMA";
System.out.println("Number of occurences of TEST is " + countNumberOfTimesInALine(line, "TEST"));
Here is the result
Number of occurences of TEST is 3
I have published an answer to a similar question as yours here
Related
I got a list of sentences. I split each sentences and filtered the unwanted words and puncuations. and then store them into
ArrayList<ArrayList<String>> sentence
then I used a hashMap to find the most common word. how could I modify the following hashmap code so I can also find the most common consecutive pairs of words.(N-grams for phrases)
HashMap<String, Integer> hashMap = new HashMap<>();
// Splitting the words of string
// and storing them in the array.
for(int i =0; i < sentence.size(); i++){
ArrayList<String> words = new ArrayList<String>(sentence.get(i));
for (String word : words) {
//Asking whether the HashMap contains the
//key or not. Will return null if not.
Integer integer = hashMap.get(word);
if (integer == null)
// Storing the word as key and its
// occurrence as value in the HashMap.
hashMap.put(word, 1);
else {
// Incrementing the value if the word
// is already present in the HashMap.
hashMap.put(word, integer + 1);
}
}
}
i dont know where to start. should i adjust the way i split or do i no split at all in the first place.
To find the most common consecutive pairs of words (N-grams for phrases), you can modify the above code by looping through the sentence arraylist and creating a new hashmap with the pairs of words as the keys and the number of times they appear as the values. Then, you can iterate through the new hashmap and find the pair of words with the highest value.
public static String getMostCommonNGram(ArrayList<ArrayList<String>> sentence) {
HashMap<String, Integer> nGramMap = new HashMap<>();
// loop through the sentences
for (ArrayList<String> words : sentence) {
// loop through the words and create pairs of words
for (int i = 0; i < words.size() - 1; i++) {
String nGram = words.get(i) + " " + words.get(i + 1);
// check if the n-gram already exists in the map
Integer count = nGramMap.get(nGram);
// if not, add it to the map with count = 1
if (count == null) {
nGramMap.put(nGram, 1);
} else {
// if yes, increment the count
nGramMap.put(nGram, count + 1);
}
}
}
// find the n-gram with the highest count
String mostCommonNGram = "";
int maxCount = 0;
for (String nGram : nGramMap.keySet()) {
int count = nGramMap.get(nGram);
if (count > maxCount) {
maxCount = count;
mostCommonNGram = nGram;
}
}
return mostCommonNGram;
}
In my program, I am reading data from a CSV file which follows the pattern of dance group and then the dancers in the group. I am struggling to sort the dancers names alphabetically.
public String listAllDancesAndPerformers() {
// get CSV file for dances Data
ArrayList<String> dancesData = getCSV("src/csvFiles/danceShowData_dances.csv");
int lineNumber = 0;
String result = "";
//for each line in dances csv file
for (String line : dancesData) {
//split into two sections - [0] is name of dance & [1] is dancers
String[] splitByTab = line.split("\t");
//take the dancers [1] of splitByTab and split it by commas
// this makes that seperatedNames[1], [2] etc are all the dancers
//and i am supposed to sort the seperated names to print out alphabetticaly
String[] separatedNames = splitByComma(splitByTab[1]);
lineNumber++;
result += lineNumber + ": ";
result += (splitByTab[0].trim()) + "\n";
result += (listAllDancersIn(splitByTab[0].trim())) + "\n";
}
return result;
}
list all dancers method which takes an input of a dance name and then prints out the dance name followed by the dancers inside reading from the CSV file
public String listAllDancersIn(String dance) {
// get CSV file for dances Data
ArrayList<String> dancesData = getCSV("src/csvFiles/danceShowData_dances.csv");
String result = "";
// for each line in dances csv file
for (String line : dancesData) {
// split into two sections - [0] is name of dance & [1] is dancers
String[] splitByTab = line.split("\t");
splitByTab[0] = splitByTab[0].trim();
// if name of dance matches given dance name
if (splitByTab[0].equals(dance)) {
// split names of dancers into individual strings
String[] separatedNames = splitByComma(splitByTab[1]);
// iterate through names
for (int i = 0; i < separatedNames.length; i++) {
// append result with output of getDanceGroupMembers (and trim input)
result += ", " + getDanceGroupMembers(separatedNames[i].trim());
}
}
}
// remove leading comma and space
result = result.substring(2);
return result;
}
In your listAllDancersIn method, use an ArrayList instead of your result += instructions.
Then at end, you can use the default sorter, which will sort alphabetically:
Collections.sort(resultAsList);
ANd if you still want this method to return a sorted string, instead of a sorted list, you can do it this way, using Join method:
return String.join(", ", resultAsList);
Marius, see whether below code works as you intended.
import java.util.ArrayList;
import java.util.Collections;
public class SortDancers {
public static void main(String[] args) {
System.out.println(new SortDancers().listAllDancesAndPerformers());
}
public String listAllDancesAndPerformers() {
ArrayList<String> dancesData = new ArrayList<String>();
dancesData.add("Dance1 \t Kelly, Andrew, Nathan");
dancesData.add("Dance2 \t John, Sally, Kevin, Abby");
dancesData.add("Dance3 \t Laura, Benny, Jane");
// I assume you get this kind of data from getCSV()
int lineNumber = 0;
String result = "";
for (String line : dancesData) {
String[] splitByTab = line.split("\t");
String[] separatedNames = splitByTab[1].split(",");
lineNumber++;
result += lineNumber + ": ";
result += (splitByTab[0].trim()) + "\n";
ArrayList<String> separatedNamesList = new ArrayList<String>();
for (int i = 0; i < separatedNames.length; i++) {
separatedNamesList.add(separatedNames[i].trim());
}
Collections.sort(separatedNamesList);
result += String.join(", ", separatedNamesList);
result += "\n";
}
return result;
}
}
I think you should split your code:
Read CSV file and build correct data structure;
Print data structure to console or String.
public static Map<String, Set<String>> listAllDancesAndPerformers() {
final Pattern pattern = Pattern.compile("(?<group>\\w+)\\t+(?<dancers>.+)");
final Pattern comma = Pattern.compile("\\s*,\\s*");
Map<String, Set<String>> groups = new TreeMap<>();
for (String line : getCSV("src/csvFiles/danceShowData_dances.csv")) {
Matcher matcher = pattern.matcher(line);
if (matcher.matches())
groups.put(matcher.group("group"), new TreeSet<>(Arrays.asList(comma.split(matcher.group("dancers")))));
}
return groups;
}
If danceShowData_dances.csv file content is:
beginners anna,maria,olga
mature bob,marvin,peter
Then result Map will contain:
"beginners" : ["anna", "maria", "olga"]
"mature" : ["bob", "marvin", "peter"]
And finally you can create method that convert given Map into String with required format:
public static String printToString(Map<String, Set<String>> groups) {
int count = 1;
StringBuilder buf = new StringBuilder();
for (Map.Entry<String, Set<String>> entry : groups.entrySet()) {
if (buf.length() > 0)
buf.append('\n');
buf.append(count++).append(':');
buf.append(entry.getKey());
if (!entry.getValue().isEmpty())
buf.append('\n').append(String.join(", ", entry.getValue()));
}
return buf.toString();
}
Output:
1:beginners
anna, maria, olga
2:mature
bob, marvin, peter
I'm very new to java here so please bear with me.
I'm currently trying to create code that does the following:
Add code to your processFile function that counts the number of times each word appears in the file.
Add code to your processFile function that loops through your HashMap to find the most frequent word. After your loop, the variable added for bonus requirement #1 should contain the value for the most frequent word.
So far I've come up with this and was wondering if anyone could please help me progress further.
Map<String, Integer> freq = new Hashmap<String, Integer>();
FileInputStream fi = new FileInputStream("readwords,txt");
Scanner input = new Scanner(fi);
while (input.hasNext()) {
String word = input.next().toLowerCase();
Integer f = freq.get(word);
if (f == null) {
freq.put(word,1);
}
else {
freq.put(word,f+1);
}
}
Thank you
Your syntax is close, but you've mixed String declaration styles, your generic type is missing a > and your variable names are inconsistent. I think you wanted something like,
Map<String, Integer> map = new HashMap<>();
File file = new File("readwords.txt");
try (Scanner input = new Scanner(file)) {
while (input.hasNext()) {
String word = input.next().toLowerCase();
Integer f = map.get(word);
if (f == null) {
map.put(word, 1);
} else {
map.put(word, f + 1);
}
}
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
}
For counting the words and for getting most frequently used word you can try this:
public void processFile() throws Exception {
Map<String, Integer> freq = new HashMap<>();
FileInputStream fi = new FileInputStream("readwords.txt");
String mostFreqWord = null;
Integer highestFreq = 0;
Scanner input = new Scanner(fi);
while (input.hasNext()) {
String word = input.next().toLowerCase();
Integer f = freq.get(word) == null ? 1 : freq.get(word) + 1;
freq.put(word, f);
if(f > highestFreq) {
mostFreqWord = word; // set most frequent word
highestFreq = f; // frequency of most frequent word
}
}
System.out.println("Word :" + mostFreqWord
+ " is the most frequent word with frequency:" + highestFreq);
}
Since I have modified the code you already posted, here is the explanation of modification that I did (I assume that you already know what your original code was doing).
Inside loop, below line checks if word word has encountered first time in loop, if yes then sets it's frequency as 1 otherwise it increments frequency for that word.
Integer f = freq.get(word) == null ? 1 : freq.get(word) + 1;
Then it sets latest frequency for the word: freq.put(word, f);
Statement if(f > highestFreq) checks if the highest frequency is still highest, if not then updates highestFreq and mostFreqWord words.
I'm working on a project where I enter a URL, the file is read and the amount of lines, characters, and words are outputted in a text file. I'm not having an issue with that. Code below will be pretty long, sorry in advance.
I also have to output to the same text file all of the words in the file, and the amount of times each word is displayed in the file. I've been working on it for a while and I've gotten to the point where all the lines/characters/words are outputted to the text file, but I can't figure out how to display the actual words and the amount of times they are in the file.
String[] wordSubstrings = line.replaceAll("\\s+", " ").split(" ");
List<String> uniqueWords = new ArrayList<String>();
for (int i = 0; i < wordSubstrings.length; i++) {
if (!(uniqueWords.contains(wordSubstrings[i]))) {
uniqueWords.add(wordSubstrings[i]);
You could use a Multiset
Multiset<String> words = HashMultiset.create();
for (String word : wordList)
words.add(word);
for (String word : words.elementSet())
System.out.println(word + ": " + words.count(word));
I've tested something with a HashMap which seems to work pretty well.
Here is my code that I used to test it, I hope it helps:
String[] wordSubstrings = new String[]{"test","stuff","test","thing","test","test","stuff"};
HashMap<String,Integer> uniqueWords = new HashMap<>();
for ( int i = 0; i < wordSubstrings.length; i++)
{
if(!(uniqueWords.containsKey(wordSubstrings[i])))
{
uniqueWords.put(wordSubstrings[i], 1);
}
else
{
int number = uniqueWords.get(wordSubstrings[i]);
uniqueWords.put(wordSubstrings[i],number + 1);
}
}
for (Map.Entry<String, Integer> entry : uniqueWords.entrySet()) {
String key = entry.getKey();
int value = entry.getValue();
//Do Something with the key and value
}
You can use arraylist of class which will contain word and count as member variables.
List <MyClass> uniqueWords = new ArrayList<MyClass> ();
MyClass()
{
String uniqueword;
int count;
}
How to count how many Occurences of the word i have in an ArrayList? Can anyone please help me?
If you are trying to only find how many matches and you don't need to know positions try using this
BufferedReader reader = new BufferedReader(new FileReader("hello.txt"));
StringBuilder builder = new StringBuilder();
String str = null;
while((str = reader.readLine()) != null){
builder.append(str);
}
String fileString = builder.toString();
String match = "wordToMatch";
String[] split = fileString.split(match);
System.out.println(split.length - 1);//finds amount matching exact sentance
Try this:
private static int findMatches(List<Character> text, List<Character> pattern) {
int n = text.size();
int m = pattern.size();
int count = 0; // tracks number of matches found
for (int i = 0; i <= n - m; i++) {
int k = 0;
while (k < m && text.get(i + k) == pattern.get(k))
k++;
if (k == m) { // if we reach the end of the pattern
k = 0;
count++;
}
}
return count;
}
Finding the number of times one string occurs in another is fairly trivial so it might be best to read your file into a string rather than a list of characters if all you are going to do is search it
int matches = 0;
for (int index = text.indexOf(target); index != -1; index = text.indexOf(target, index + 1))
matches++;
However most efficient search algorithms are going to require you to create an index as you read the input - something like a Map<String,List<Integer>> that can quickly find a list of word positions for a given word.
For example (using Java 8):
List<String> words; // ordered list of words read from the text file
Map<String, List<Integer>> index = IntStream.range(0, words.size())
.collect(Collectors.groupingBy(words::get));
Then searching for a word becomes trivial (e.g. index.get(word).size() is the number of occurences). Searching for a phrase is not much harder: you break it into words then filter the index values for consecutive word positions.
Use a List<String> instead of List<Character>. After that you can either brute force the occurrence or use Java 8's streams and filter.
public static void main(String[] args) throws Exception {
List<String> words = new ArrayList() {{
add("one");
add("two");
add("one");
add("one");
add("two");
add("one");
add("two");
add("three");
add("one");
add("one");
}};
String wordToSearch = "one";
int occurrence = 0;
for (String word : words) {
if (word.equals(wordToSearch)) {
occurrence++;
}
}
System.out.println("Brute force: " + occurrence);
System.out.println("Streams : " + words.stream()
.filter(word -> word.equalsIgnoreCase(wordToSearch)).count());
}
Results:
Brute force: 6
Streams : 6