Finding Unique Words In A Text File Using ArrayList

Finding Unique Words In A Text File Using ArrayList - java

I'm working on a project where I enter a URL, the file is read and the amount of lines, characters, and words are outputted in a text file. I'm not having an issue with that. Code below will be pretty long, sorry in advance.
I also have to output to the same text file all of the words in the file, and the amount of times each word is displayed in the file. I've been working on it for a while and I've gotten to the point where all the lines/characters/words are outputted to the text file, but I can't figure out how to display the actual words and the amount of times they are in the file.
String[] wordSubstrings = line.replaceAll("\\s+", " ").split(" ");
List<String> uniqueWords = new ArrayList<String>();
for (int i = 0; i < wordSubstrings.length; i++) {
if (!(uniqueWords.contains(wordSubstrings[i]))) {
uniqueWords.add(wordSubstrings[i]);

You could use a Multiset
Multiset<String> words = HashMultiset.create();
for (String word : wordList)
words.add(word);
for (String word : words.elementSet())
System.out.println(word + ": " + words.count(word));

I've tested something with a HashMap which seems to work pretty well.
Here is my code that I used to test it, I hope it helps:
String[] wordSubstrings = new String[]{"test","stuff","test","thing","test","test","stuff"};
HashMap<String,Integer> uniqueWords = new HashMap<>();
for ( int i = 0; i < wordSubstrings.length; i++)
{
if(!(uniqueWords.containsKey(wordSubstrings[i])))
{
uniqueWords.put(wordSubstrings[i], 1);
}
else
{
int number = uniqueWords.get(wordSubstrings[i]);
uniqueWords.put(wordSubstrings[i],number + 1);
}
}
for (Map.Entry<String, Integer> entry : uniqueWords.entrySet()) {
String key = entry.getKey();
int value = entry.getValue();
//Do Something with the key and value
}

You can use arraylist of class which will contain word and count as member variables.
List <MyClass> uniqueWords = new ArrayList<MyClass> ();
MyClass()
{
String uniqueword;
int count;
}

Related

how to find the most common phrases in a list of strings

I got a list of sentences. I split each sentences and filtered the unwanted words and puncuations. and then store them into
ArrayList<ArrayList<String>> sentence
then I used a hashMap to find the most common word. how could I modify the following hashmap code so I can also find the most common consecutive pairs of words.(N-grams for phrases)
HashMap<String, Integer> hashMap = new HashMap<>();
// Splitting the words of string
// and storing them in the array.
for(int i =0; i < sentence.size(); i++){
ArrayList<String> words = new ArrayList<String>(sentence.get(i));
for (String word : words) {
//Asking whether the HashMap contains the
//key or not. Will return null if not.
Integer integer = hashMap.get(word);
if (integer == null)
// Storing the word as key and its
// occurrence as value in the HashMap.
hashMap.put(word, 1);
else {
// Incrementing the value if the word
// is already present in the HashMap.
hashMap.put(word, integer + 1);
}
}
}
i dont know where to start. should i adjust the way i split or do i no split at all in the first place.

To find the most common consecutive pairs of words (N-grams for phrases), you can modify the above code by looping through the sentence arraylist and creating a new hashmap with the pairs of words as the keys and the number of times they appear as the values. Then, you can iterate through the new hashmap and find the pair of words with the highest value.
public static String getMostCommonNGram(ArrayList<ArrayList<String>> sentence) {
HashMap<String, Integer> nGramMap = new HashMap<>();
// loop through the sentences
for (ArrayList<String> words : sentence) {
// loop through the words and create pairs of words
for (int i = 0; i < words.size() - 1; i++) {
String nGram = words.get(i) + " " + words.get(i + 1);
// check if the n-gram already exists in the map
Integer count = nGramMap.get(nGram);
// if not, add it to the map with count = 1
if (count == null) {
nGramMap.put(nGram, 1);
} else {
// if yes, increment the count
nGramMap.put(nGram, count + 1);
}
}
}
// find the n-gram with the highest count
String mostCommonNGram = "";
int maxCount = 0;
for (String nGram : nGramMap.keySet()) {
int count = nGramMap.get(nGram);
if (count > maxCount) {
maxCount = count;
mostCommonNGram = nGram;
}
}
return mostCommonNGram;
}

Morse code in java using HashMap

I try to implement Morse code translator in Java, using as little code as possible, but in my program i get an error, cause hashmap is out of border. Is it possible to assign size of map equals to length of string, that i input? But no less than 26 for not just putting out alphabetical characters. Thanks
String a = reader.readLine();
Map<String, String> words = new HashMap<>();
words.put("s", "***"); //only two characters still
words.put("o", "---");
for(int i=0; i<a.length(); i++)
{
String checker = Character.toString(a.charAt(i));
if(checker.equals(words.keySet().toArray()[i]))
{
System.out.print(words.values().toArray()[i]+" ");
}
}

You just need to see if the current letter is contained within the map, if it is then you can grab the corresponding mapping for it within the words hashmap.
String a = reader.nextLine();
Map<String, String> words = new HashMap<>();
words.put("s", "***"); //only two characters still
words.put("o", "---");
String translated = "";
for(int i=0; i<a.length(); i++)
{
String checker = Character.toString(a.charAt(i));
if(words.containsKey(checker))
{
translated += words.get(checker);
}
else{
translated += checker;
}
}
System.out.println("Input: " + a + ", Morse: " + translated);
Output
sos
Input: sos, Morse: ***---***
sor
Input: sor, Morse: ***---r
This will convert all the letters the map knows about, for those it doesn't it will not change.

if(checker.equals(words.keySet().toArray()[i]))
{
System.out.print(words.values().toArray()[i]+" ");
}
Change this to:
if(words.get(checker) != null)
System.out.print(words.get(checker) + " ");

Word count function that counts words in a txt file

I'm very new to java here so please bear with me.
I'm currently trying to create code that does the following:
Add code to your processFile function that counts the number of times each word appears in the file.
Add code to your processFile function that loops through your HashMap to find the most frequent word. After your loop, the variable added for bonus requirement #1 should contain the value for the most frequent word.
So far I've come up with this and was wondering if anyone could please help me progress further.
Map<String, Integer> freq = new Hashmap<String, Integer>();
FileInputStream fi = new FileInputStream("readwords,txt");
Scanner input = new Scanner(fi);
while (input.hasNext()) {
String word = input.next().toLowerCase();
Integer f = freq.get(word);
if (f == null) {
freq.put(word,1);
}
else {
freq.put(word,f+1);
}
}
Thank you

Your syntax is close, but you've mixed String declaration styles, your generic type is missing a > and your variable names are inconsistent. I think you wanted something like,
Map<String, Integer> map = new HashMap<>();
File file = new File("readwords.txt");
try (Scanner input = new Scanner(file)) {
while (input.hasNext()) {
String word = input.next().toLowerCase();
Integer f = map.get(word);
if (f == null) {
map.put(word, 1);
} else {
map.put(word, f + 1);
}
}
} catch (FileNotFoundException fnfe) {
fnfe.printStackTrace();
}

For counting the words and for getting most frequently used word you can try this:
public void processFile() throws Exception {
Map<String, Integer> freq = new HashMap<>();
FileInputStream fi = new FileInputStream("readwords.txt");
String mostFreqWord = null;
Integer highestFreq = 0;
Scanner input = new Scanner(fi);
while (input.hasNext()) {
String word = input.next().toLowerCase();
Integer f = freq.get(word) == null ? 1 : freq.get(word) + 1;
freq.put(word, f);
if(f > highestFreq) {
mostFreqWord = word; // set most frequent word
highestFreq = f; // frequency of most frequent word
}
}
System.out.println("Word :" + mostFreqWord
+ " is the most frequent word with frequency:" + highestFreq);
}
Since I have modified the code you already posted, here is the explanation of modification that I did (I assume that you already know what your original code was doing).
Inside loop, below line checks if word word has encountered first time in loop, if yes then sets it's frequency as 1 otherwise it increments frequency for that word.
Integer f = freq.get(word) == null ? 1 : freq.get(word) + 1;
Then it sets latest frequency for the word: freq.put(word, f);
Statement if(f > highestFreq) checks if the highest frequency is still highest, if not then updates highestFreq and mostFreqWord words.

How to find on which line a word is in Java

I am trying to create a program that counts the number of times a word appears in a text and also tell you how many times it appears on each line. I have managed to find the number of times the word appears and the number of lines in the text, but I cannot find on which line the word appears in and how many times. Could you please help me? This is my code so far:
FileReader file = new FileReader("C:/Users/User/Desktop/test.txt");
BufferedReader buffer = new BufferedReader(file);
String line = buffer.readLine();
Map<String, Integer> hash = new HashMap<String, Integer>();
int counter = 0; //number of lines
while (line != null){
String[] words = line.split(" ");
for (String s : words) {
Integer i = hash.get(s);
hash.put(s, (i==null)? 1: i+1);
}
line = buffer.readLine();
counter = counter + 1;
}
System.out.println(hash);
System.out.println(counter);

It is additional information to each row. You just need an information of count on each line, therefore simple Map is not enough, you need Map of Map at each row.
There are two basic ways :
Map<Integer, Map<String, Integer>> hashOfHash = new HashMap<>();
List<Map<String, Integer>> list = new ArrayList<>();
First line creates Map of your Map based on integer key value - which would be the line.
Second line is creating list of your Maps, because the order in list is stored, you can now which line is which just by iterating through it.
I would recommend second line.
You need also modify your while cycle a bit to be able to create new map for each line (think about it that you need to do the same as it does at first line).
For example this should do the same as your program, but it will show results for each row :
public static void main(String[] args) throws FileNotFoundException, IOException {
FileReader file = new FileReader("C:/Users/User/Desktop/test.txt");
BufferedReader buffer = new BufferedReader(file);
String line = buffer.readLine();
List<Map<String, Integer>> list = new ArrayList<>();
while (line != null) {
Map<String, Integer> hash = new HashMap<String, Integer>();
String[] words = line.split(" ");
for (String s : words) {
Integer i = hash.get(s);
hash.put(s, (i == null) ? 1 : i + 1);
}
line = buffer.readLine();
list.add(hash);
}
int i=0;
for (Map<String, Integer> mapAtRow : list) {
i++;
System.out.println("at row " + i + "we found this: " + mapAtRow);
}
}

Here is a recursive method that will allow you, using String.indexOf to count how many times a word appears in a line.
You have read the line from your bufferedReader
String line = buffer.readLine();
then in your loop you have
for (String s : words) {
int numberOfOccurencesOfS = countNumberOfTimesInALine(line,s);
}
the countNumberOfTimesInALinereceives the original line and the word your are counting as arguments. To use it you should also declare a class variable like this:
private static int numberOfLineOccurences;
Here is the method
public static int countNumberOfTimesInALine(String line, String word) {
if (line.indexOf(word) == -1) {
return numberOfLineOccurences;
} else {
numberOfLineOccurences++;
if (line.indexOf(word) + word.length() > line.length() -1 ) {
return numberOfLineOccurences;
}
return countNumberOfTimesInALine(
line.substring(line.indexOf(word) + word.length()), word );
}
}
Here is a usage example:
String line = "DEMO TEST DEMO TEST DEMO TEST ALPHA BETA GAMMA";
System.out.println("Number of occurences of TEST is " + countNumberOfTimesInALine(line, "TEST"));
Here is the result
Number of occurences of TEST is 3
I have published an answer to a similar question as yours here

Search for matching words in text WITHOUT using HashSet

I am trying to write a program that reads a text file, counts the total number of words and determines which words are repeated in the text, and how many times they occur (for simplicity, the text file contains no punctuation).
I have the following code for finding the repeated words, and how many times they occur.
ArrayList<String> words = new ArrayList<String>();
String myString;
String[] line;
// Read words from file, populate array
while ((myString=br.readLine()) != null) {
line = myString.split(" ");
for (String word : line) {
words.add(word.toLowerCase()); // Ignore case
}
}
The above part reads the text file, and adds every word into an ArrayList, words. The following part uses HashSet to determine which words in the ArrayList words occur more than once. It then prints out these words, followed by a counter indicating the number of occurrences.
// Count the occurrences of each word
Set<String> unique = new HashSet<String>(words);
for (String key : unique) {
if (Collections.frequency(words, key) > 1) {
System.out.println(key + ": " + Collections.frequency(words, key));
}
}
Is there a way to do this WITHOUT using HashSet? For example by using two arrays, and comparing them simultaneously? I have tried the following:
numWords = words.size();
String[] wordArray = new String[numWords];
String[] newWordArray = new String[wordArray.length];
String compWord;
// Find duplicates
for(int i = 0; i < newWordArray.length; i++) {
for(int j = 0; j < newWordArray.length; j++) {
if( i != j && newWordArray[i].equals(newWordArray[j])) {
compWord = newWordArray[i];
System.out.println(compWord);
}
}
}
This only prints out the words that occur more than once, as they are read from the file. Which means that it detects repeated words. However, is there a way to get these words in the form "[WORD : timesRepeated]"?

Develop Reference

Java is a programming language and computing platform first released by Sun Microsystems in 1995.

Finding Unique Words In A Text File Using ArrayList - java

You could use a Multiset Multiset<String> words = HashMultiset.create(); for (String word : wordList) words.add(word); for (String word : words.elementSet()) System.out.println(word + ": " + words.count(word));

You can use arraylist of class which will contain word and count as member variables. List <MyClass> uniqueWords = new ArrayList<MyClass> (); MyClass() { String uniqueword; int count; }

Related

how to find the most common phrases in a list of strings

Morse code in java using HashMap

Word count function that counts words in a txt file

How to find on which line a word is in Java

Search for matching words in text WITHOUT using HashSet

Categories

Resources